PurePerformance - Keptn – A Technical “Behind the Scenes Look” with Dirk Wallerstorfer

Episode Date: July 8, 2019

Keptn (@keptnProject) is an open source control plane for Kubernetes enabling continuous delivery and automated operations. In this session we chat with Dirk Wallerstorfer (@wall_dirk) who is leading ...the keptn development team. We learn from Dirk why they choose knative as serverless framework to let keptn connect to other DevOps tools in the toolchain, how the event driven architecture works, which use cases are supported and where the road is heading.If you are interested also check out our Getting Started with keptn YouTube Tutorial, join the keptn slack channel, keep an eye at the keptn community and give feedback after trying out keptn yourself by following the following installation instructions: https://keptn.sh/docs/Links:keptn on Twitter - https://twitter.com/keptnProjectDirk on Twitter - https://twitter.com/wall_dirkknative - https://cloud.google.com/knative/Keptn Video - https://www.youtube.com/watch?v=0vXURzikTacKeptn Slack - https://keptn.slack.com/join/shared_invite/enQtNTUxMTQ1MzgzMzUxLTcxMzE0OWU1YzU5YjY3NjFhYTJlZTNjOTZjY2EwYzQyYWRkZThhY2I3ZDMzN2MzOThkZjIzOTdhOGViMDNiMzIKeptn Community - https://github.com/keptn/communityKeptn Docs - https://keptn.sh/docs/

Transcript
Discussion (0)
Starting point is 00:00:00 It's time for Pure Performance! Get your stopwatches ready, it's time for Pure Performance with Andy Grabner and Brian Wilson. Hello and welcome to another episode of Pure Performance. My name is Brian Wilson, and as always, I have my co-host Andy Grabner here with me, hot off the presses, literally just finished doing a webinar on the same topic we're talking about. So this should be probably the most interesting podcast we've ever done, ever, period. No pressure. Hey, Andy, how are you doing?
Starting point is 00:00:49 I'm really flattered now that you actually think that I have a good webinar. And actually, the person standing very close to me because we're sharing the same microphone was also part of the webinar. So the pressure is on, but we accept the challenge. Oh, yes, we do. Hi, everyone from my side. This is Dirk from Dynatrace also. Hi, Dirk.
Starting point is 00:01:11 Hey. So you all just finished doing a bit on Captain F. You remember, Andy, a few, actually, I guess a few months ago now, but several episodes back, we talked to Alex Reitbauer of Dynatrace, and he gave us a nice let's say higher level view about what captain is uh you know we did get into some of the detail of it obviously but we promised we would be back with dirk to get into the nitty-gritty of it in the inner workings and that's what we're going to do today am i correct or did i completely miss the memo on
Starting point is 00:01:40 what we're really doing no no that's exactly what is. I think we really wanted to give more people the chance to learn more about Captain and really understand, for instance, what I asked Dirk today, you know, why did we chose technologies like Knative? On which platforms are we running? What type of tools does Captain interact with? And I think these are some of the questions I keep getting when I, you know, fly around and keep, you know, talks at meetups or conferences. These are the questions that I get.
Starting point is 00:02:10 And some of them, you know, I'm very comfortable answering. Some of them I always then have to check back with Dirk if I'm not saying something rubbish. And that's why it's great to have Dirk here, who is leading the development team on Captain. Awesome. And before we let Dirk introduce himself a little better, I do want to just make sure everyone who's listening understands that Captain is not a Dynatrace product that you pay for in license.
Starting point is 00:02:33 It is an open source project that you can use. You don't even have to use Dynatrace along with it. So please don't get turned off that we're going to be speaking about a Dynatrace product today. This is a, what's the easiest way to say this a pipeline management tool what would you say how would you describe it in like in a little bit like actually let's let dirk do that dirk how would welcome welcome to the show i know you already said hello um give us you know why don't you tell us a little bit your background but before you do that, if you had to give, you know,
Starting point is 00:03:07 Captain is a blank, blank tool, right? How would you fill in those blanks? Captain is a continuous delivery and continuous operations tool. Let's say framework, not tool. Let's go with framework. Okay, perfect. There's two terms that he just threw out there, right? Continuous delivery and continuous operations.
Starting point is 00:03:33 And I think this is the cool thing about that it really supports both aspects. And I know you want Duke to kind of introduce himself, but I also want to say something about this because this also needs a shout out to Alois, because Alois came up with a great analogy on how to differentiate continuous deployment and what we also, how we call it autonomous operations. And he's comparing it with what NASA is doing when they put rockets into space. They have launch control and they have mission control. So launch control be the team that is responsible for making sure the rocket can lift off and they're responsible until the moment when the counter goes to zero.
Starting point is 00:04:07 And at any point in time, they can abort the mission. Well, because they may find some quality issues, environment issues, something goes wrong. And once the rocket lifts off, they switch over to mission control. And those folks are then responsible for the operations of the whole rocket, which means get them up to space, get them safely down again. But they have to deal with different things, and they cannot just simply abort a mission. And I think that's actually the cool way. And we have to give Alois credit for it because he really came up with that analogy. It is a nice analogy. You have to be a real rocket scientist to come up with that analogy.
Starting point is 00:04:42 Okay, so with all that, right, so D dirk we know you're going to be telling us a lot of stuff you're going to be expanding on this but let's let people know a little bit about who you are because probably a lot of our listeners don't know much about you unless they've been already checking out the captain project they probably see your name all over it i'm sure but let us know who you are and how you maybe got involved in Captain and what your favorite color is or something. MARTIN SPLITTMANN- OK, nice. I like colors.
Starting point is 00:05:08 I started working at Dynatrace for close to four years now. And we've been focused on working around autonomous cloud management. It's a service offering we have at Dynatrace that deals with what is actually necessary for a company to become really cloud native to start shipping software with high quality with release very often and produce stable releases and we took the learnings that we ourselves did and created the Captain project out of that and how I came to work on captain well we've actually started the project now for
Starting point is 00:05:47 something like one year ago on the internally and at the start of the year we started with the open source project also available on github right now and that is basically what i spend my time on right now. And my favorite color is blue. Well, thank you. See? I didn't know that. And I think actually you'll probably like it when it switches from green to blue. Yeah.
Starting point is 00:06:13 Oh. And growing up, my favorite color was green. I have to say, I'm not even making that up. So what about you, Andy? What's your favorite color since we're talking about it?
Starting point is 00:06:26 As a kid, I always said red because I like the fire. And I like the, well, not fire is fire, but like the warmth and the passion. Yeah, yeah. No, I think it's still red. Okay. So there we have. We have green and blue deployments, and then the red signals for when something goes wrong, and you know to switch back together.
Starting point is 00:06:45 Perfect. That's awesome. And that actually plays into Captain a little bit. But anyway, let's go on. Andy, you're very familiar with the webinar, so why don't you take over the line of questioning, because obviously you know where we can lead this and all the cool stuff we can let our listeners know about with this. this so i think they um what people keep asking me and first of all where can i find out more about captain but i think that should be very easy and straightforward right sure yeah so we we have an community we have a github repositories um so we have an own organization github.com slash captain we have a repository that shows all of the details about our community in
Starting point is 00:07:26 there we have a twitter handle it's captain project where you can follow us we have a captain at dynatree's email address and we're very proud of our very own slack channel and slack workspace where you can actually also join in, get updates on the continuous work that the team does. So they give updates on the tasks they're working right now. We make short summaries of what we've covered in the last sprint. And of course you get information on the latest released versions and what the features that were added are. And this is also the perfect place to actually engage with the Captain team and with us. And also tell us what you like about Captain. Tell us what you don't like about Captain.
Starting point is 00:08:13 Tell us what you're missing in Captain. Because in the end, what we want to do is build the framework in a way that really solves the real life problem of continuous delivery and continuous operations. And that brings me to the next question, i think some people are when they approach me they say so you are building stuff and i think that's the first big differentiation from captain to ci we're not doing ci we're not building images or artifacts actually the life of a captain trace or whatever we want to we're going to call it but the life of a the life the life of a captain trace or whatever we're going to call it. But the life of a the life of the duty of a captain starts when there's a new artifact. And what so my question to you is, what is an artifact for captain?
Starting point is 00:08:55 And what does it actually then trigger once there's a new artifact available? So, as you said, continuous integration is not a part of Captain. So whenever a new artifact has been built, and this can be a char file, a war file, or a Docker container, if you're already working on cloud native platforms, we get an event. So we get notified about that. And what Keptn then does is it is notified about a new container image, for example, and it starts to update the configuration of the project that you've created. So that's one of the peculiarities of Keptn. It follows the configuration as code approach. So all of the configuration about your application environments and everything is stored in GitHub. And it updates the configuration in GitHub
Starting point is 00:09:53 and then again sends an event to Captain. So as you probably now also know, Captain follows an event-driven architecture, which allows you to decouple the service pretty easily. And then it deploys the artifact using a service that is adequate for deploying that type of artifact that you have sent us. It evaluates certain quality gates that you can also define as code. So following the monitoring as code approach, Andy, that you've been preaching over the last few years, once we decide that the quality gate
Starting point is 00:10:32 is passed or not, we can actually decide if we want to continue promoting this artifact to the next stage. So let's say we started in the death stage, for example. We can decide if we want to promote it to the staging stage or to a hardening stage or whatever you want to name it. And then this whole workflow begins again. And in this workflow, we can also define if we want to do a direct deployment, blue-green deployment, or a canary deployment, and also define the test strategy so only execute functional tests from execute performance tests and so on and so forth and all of that is again stored in a configuration
Starting point is 00:11:15 as code and again in the github repository so there are a few patterns in there that we've covered now but this is the the basic functionality that the framework provides. And I think what's really, the way I always explain it, we can do all of this if we want to by taking a bunch of tools and then writing some custom code that says, once I get a new artifact, then I kick off this Jenkins pipeline for deployment, then I kick off this testing tool through this particular API.
Starting point is 00:11:45 Then I write a Python script or some other script that does another thing. And basically what you explained to me is that Captain takes care of the complete orchestration. That means instead of me, you, 50 other teams within Dynatrace, 10,000 other teams across the world, building their own integration and their own pipeline, their own custom code in order to implement things that we've been preaching about. Automated deployment with different deployment types, canary, blue, green, shadow, dark.
Starting point is 00:12:13 Automating tests, automating quality gates, scaling up, scaling down, self-healing. Instead of implementing this all, let's say, by hand, quote unquote, and then doing this not once, but many, many times. This is why, what at least the big problem Captain solves for me. Because once I've set up Captain, I don't need to write any custom gluing code anymore because Captain orchestrates these tools.
Starting point is 00:12:37 And Captain takes what I produce as a developer, which is an artifact, and then is then automatically getting all the right tools involved to push this artifact in a safe automated fast reliable way through the different stages all the way up to production right and i think that's pretty cool yes it is right well put and hey can i can i can i ask uh make try to simplify it even more i might get it wrong but this is why i want you to correct me because i'm really you know to me i think it's
Starting point is 00:13:05 really important to really know how to explain it on simple levels to people and dirk gave us a good one earlier but i'm trying to think of simplifying it even more if we could we say let's take um if kubernetes kubernetes is to containers as captain is to pipelines, maybe. Like a management layer, an orchestration layer on top of your pipeline. Maybe? I would rather say maybe on top of what people consider a deployment or a release.
Starting point is 00:13:40 You're releasing software. You have a change, and you want to get this change or this deployment through different stages in a safe way out to production so captain orchestrates the way to production and it orchestrates production too because remember launch and mission control getting to production is is already a challenge but then keeping it in production or keeping it's keeping the stuff in production so it doesn't break your business is the next big challenge. And so Captain orchestrates all that. And it is following certain principles.
Starting point is 00:14:13 It is implementing a lot of the use cases we've been talking about for the last couple of years, like the blue-green deployments. How do you do this and how do you monitor it? How do you scale up? On which metrics do you scale up versus scaling down? How do you enforce this and how do you monitor it how do you scale up on which metrics to scale up versus scaling down how to enforce automated quality gates what type of information do you push or pull from a monitoring tool at which particular stage so captain is completely automating all of that but i believe what's also important to understand is that captain itself actually doesn't do the work captain just gives. Captain orders a certain tool with a certain capability to do a certain task. And I think that's the thing.
Starting point is 00:14:51 And the way the workflow, the pipeline looks like is defined in code as well. That's why we have the concept of a shipyard file where you specify you have one, two, three, four stages. What type of deployment strategy, what test strategy do you have, and how do you then decide if you want to promote an artifact? Yes or no, that's all through code. Great, thanks.
Starting point is 00:15:14 Cool. Does it make sense? So, hopefully. Sure, it does. Dirk, does that make sense to you? Do you understand, Captain Dirk? Is this really what you're working on? No, I do. No, I think the No, I do.
Starting point is 00:15:25 No, I think that the Kubernetes analogy works to a certain extent, yeah, if you apply it to. MARK MANDELMANN- Meaning that it's like an orchestration layer on top of this really large complex, what could be out of control system. You can orchestrate it on this high level, as we discussed. But yeah.
Starting point is 00:15:40 DEREK WEBERMANN- You can orchestrate basically anything that has an API. So we got this question also earlier. So what are the requirements of tools that Captain can talk to? And the only requirement this actually is, is that it needs to have an API that can be called from a service out of Captain. Exactly. Also what I like a lot, I don't consider myself a great developer anymore because
Starting point is 00:16:07 i you know i mean kind of i don't spend that much time on engineering anymore so when i got exposed to kubernetes and then to helm and to yaml files and to namespaces and to kubectl and to istio then i thought oh my god in order to deploy a container, I need to understand and learn all of these things where the only thing I really want, I want to deploy my container in a safe way. And I think this is actually what Captain takes away from me when it comes to the launch control part. Because the only thing I tell Captain, hey, Captain, you will see an artifact from me. And if you see a new artifact of, let's say, the front end or the back end service, here is what I want you to do.
Starting point is 00:16:47 Deploy it here, run these tests, evaluate. And if it's good, deploy it in production. And in case it fails in production, then I have a script that you can execute. Or I want you to notify me via Slack, and then I'll take care of it. And I think it takes away all of the configuration. It takes away all of the complexity
Starting point is 00:17:03 of all these many tools and frameworks and need to actually really properly run cloud native apps and Kubernetes because it's not just Kubernetes anymore. I know it's not. Yeah, there's so many tools involved right now for deploying stuff, for testing stuff, for notifying about changes. Right. So, for example, if you consider a tool landscape
Starting point is 00:17:25 where you have like 11 different tools and we have a survey for that, the autonomous cloud survey, where we actually found out that, I think on average, there are like 11 tools in each continuous delivery pipeline. Now, if you want to add Slack integration
Starting point is 00:17:41 into all of those tools, you have to actually go into the tools install the slack plugin if there is one otherwise you need to write the configuration on your own and yeah once you're done i think two or four weeks are down have already passed um and well captain um makes it way easier to add a Slack service in there because coming back to the event-driven architecture, you can subscribe services to certain events. So what you would need to do is write a Slack service, and I actually have done that,
Starting point is 00:18:18 and subscribe it to the channels, and you'll get all of the notifications that you want to get without touching any of the tools that you want to integrate or that you want to have the updates from. Because Captain is the central hub and Captain is the central orchestrator. So everything goes through Captain and then you can connect any other tool to Captain for certain events. The Slack service, by the way, people that are listening, you should really watch the performance clinic we just recorded
Starting point is 00:18:51 because there's some cool little gimmicks that Dirk built into the Slack service for me. I will not spoil it now. Let's say there are many Captains. There are many Captains involved, exactly. So yeah, so this is pretty cool. The other thing, so going a little technical now. So Knative, you decided, or the team decided to bet on Knative.
Starting point is 00:19:18 First of all, quickly, what is Knative, and why did we choose Knative? MARTIN SPLITTINGER- Knative is a framework again that was built by Google and is actively maintained by Google. It builds on Istio and Kubernetes. So that is great because Kubernetes basically has won the pass war, if you want to put it like that. And Google also actively maintains Kubernetes. So we found that it is a good choice to go forward with that. And the reason why we chose Knative
Starting point is 00:19:53 is that it offers several features that we would have had to implement ourselves otherwise. And the core features that we use of Knative are two. There is the serving feature that allows us to scale up and scale down services on demand and also scale Kubernetes services down to zero instances, which is not possible with Kubernetes out of the box. That means we are able to keep the resource footprint that is needed of your continuous delivery pipeline
Starting point is 00:20:26 environment as small as possible. So you don't have a Jenkins instance that is really large and runs several days without you ever using it over the weekends, for example. And the second feature we are using heavily is the Knative eventing feature that actually allows us to decouple all of the components and all of the tools that are part of your continuous delivery pipeline and receive events and forward those events using channels where again services can subscribe to and can work upon receiving an event and And the cool thing of that approach, of that architecture from Knative, because all the events end up in queues,
Starting point is 00:21:09 so this is also the way you can scale. If you have a massive amount of events coming in, because in a large organization, they have hundreds of thousands of projects, and these events eventually queue up, and then they will be processed once Knative can then scale up certain services it needs so hey i have a service that now needs jenkins so if the resources are available i
Starting point is 00:21:31 will scale it up give jenkins that event and i think that's it because the question of scale came up last week when i presented this at a conference and they were saying how do you scale well i said well k-native itself is basically serverless for Kubernetes. So it has scalability architecture in. And that's also one of the reasons why we used it. So I think we took care of that problem right away from the start by choosing that framework. Yeah, we did. And Knative is right now in version 0.6. So it's also a, let's put it it like that it's a young project still and but but still they have
Starting point is 00:22:07 they provide um also like guaranteed storage of your events so you can implement your event storage with kafka or with nuts from the cncf for example um so there are enterprise-ish features already in there and they will get there will get will be even more of those features that then we in turn will also actively use um until the release of k-native 1.0 in autumn so pretty cool so k-native so that explains why k-native that means captain is the orchestrator captain sends events to the channels that are then picked up by, let's call them serverless Captain services, right, for the different tools. So in case anybody wants to write their own integrations for their tools, right, they have, like we had a question today, how would this be integrated with a static code analysis tool? When can we trigger it? And then somebody would just need to write a captain service
Starting point is 00:23:05 that says, please, captain, call me in case you are kind of internally issuing a certain event. Let's say deployment finished or start your tests or evaluate. Then you can decide in which phase of the pipeline, of the workflow you want to be called. And then you can then call your tool and then tell captain what your tool actually thinks about the whole artifact right so there's also in the documentation and there's a captain's website captain.sh yes and you can go to the docs and there's a section on the i think use cases or references that's reference reference yeah
Starting point is 00:23:43 where it says how you can write your own service in case people are interested. And I think you mentioned today, there's also some templates. Yeah. So we have right now one template and it's going to be extended by another one. We have a template for a TypeScript service,
Starting point is 00:23:58 for a TypeScript captain service, and we will have an additional one for a captain service written in Go. The basic requirement for a captain service written in Go. The basic requirement for a captain service still is you need to be able to digest cloud events. So you need to be able to spin up a HTTP endpoint, receive an event, parse the payload, and either do yourself something about that or tell a third-party tool to trigger some action for you. And so you're not limited to any programming language, actually. So that's also pretty neat.
Starting point is 00:24:34 So to give an example, because I was just working with Henrik from Neotis, he's building the Neotis or Neolot Executor Captain Service. That means when Captain says we need to run performance tests, he has a service that gets called in case a start test event comes in. Basically, it's an HTTP endpoint in a Docker container.
Starting point is 00:24:55 Knative makes sure that this Docker container is available at the moment when there's a new event coming in. It passes this event over to Handnuke's code, and in the event, it says, here is the service in this particular this event over to hendrix code and in the event it says uh here is the service in this particular stage you need to test and here's some additional information that the user told us about it now do your work and when you're done you just send an event back to captain about your result tests failed or test passed right so and he's writing the service in Java. He's writing it. Yeah. And there is, so it's really easy to start with.
Starting point is 00:25:28 So cloud events is a specification of how cloud events should look like. So there is just a specification on what fields are mandatory, which fields are optional. And there are also SDKs for cloud events. There are an SDK for Go, for TypeScript, for Java, I guess, for Python. And it's really a good starting point to use one of those SDKs to just get started with writing captain services. But essentially a cloud event is an HTTP call with a JSON based payload. And then that JSON, you have certain metadata in there,
Starting point is 00:26:01 like what type of event and then the payload. That's all it is. That's really to keep it easy or simple. Cool. So Knative, people can write their own services. We have a couple of services right now already that are part of the Captain install. Yeah. And then there's more coming.
Starting point is 00:26:16 And I think we need feedback from the listeners on what are the tools that people are using out there in the cloud native world and where they would like to include them or integrate them with Captain? I think that would be great feedback for us to hear. Definitely, yes. So it's always good to hear in what environments Captain is used and just what use case are you trying to cover with that. And during the discussions, we usually learn about the tools that that the the customers or people are using and this is also a great direction for us and also validating our
Starting point is 00:26:50 concepts so that's the concept that we've built up until now cover this use case cover this tool or for example do we need to add an additional field in the cloud event specification because some tools i don't know want to pass data on from the test execution phase to the validation phase a classical or a typical stage like if you look at the at the pipeline in captain we call them shipyard files we define if a sheetback file where you say stage def, staging, and then production. So in a classical, in one of these stages, I think the minimum things, the opinionated way how Captain looks at the stages,
Starting point is 00:27:36 the first thing that happens is a deployment. Yes. Then optionally, I think tests to be executed. Well, not optionally you can write a service that fakes that it runs a test it just replies back well I'm done
Starting point is 00:27:52 you can't do that deploy, test and then evaluate and then the reason why I mentioned this is because I think evaluate there's also a service that comes with Captain that's called Pitometer and pitometer is really then using again a monitoring or configuration as code approach where in the developer or whoever is responsible for defining the acceptance criteria for a build or for an artifact into the next
Starting point is 00:28:23 stage they can write this acceptance criteria as code and store it in GitHub. And what this means, actually, they write, I need, let's say, CPU under load. After the deployment is done and the tests are run, I want to make sure we're not using more than this type of CPU memory. This is the response time, the failure rate. We can pull in data, for instance, from other tools as well,
Starting point is 00:28:45 like static code analysis tools that we heard today. So this is, it will be configured in a config file. And then Pitometer is the service that we're using right now that is then calculating a deployment score and based on that, promote it yes or no. Yeah. And coming back to the initial point that was brought up, this not only works with Dynatrace, of course. I this not only works with dynatrace of course
Starting point is 00:29:06 i mean it works with dynatrace of course since we're getting paid by dynatrace yeah but we also have a an integration with prometheus with the open source um monitoring um framework of the cncf and yeah the quality gates also work out of the box with using prometheus as a metrics provider so and i think this is also a reach out or a shout out to any other monitoring vendors you know just build your own pitometer extension because pitometer the library itself first of all can also be used completely standalone even outside of captain but the way it works is you have data sources so we have a prometheus data source we have a Dynatrace data source, Neotis is working on the Neotis data source.
Starting point is 00:29:48 I know that our friends from T-Systems, they're currently working on additional data sources. And that's why I know that I want to reach out to everyone out there, whether it's the AppDynamics, the New Relic, the Datadogs, the SignalFX, the Instanas of the world, and I'm sure there's many other out there, just write an integration and then you can be part of an automated quality gate, either in Captain or even standalone because Pitometer can be used standalone as well. Correct. Cool.
Starting point is 00:30:14 So we learned, we know it's Kubernetes, it's Knative-based. You can write your own services. And we talked a lot about how an artifact makes it all the way into production. Now let's talk about the production piece a little bit. So the mission control. So something gets deployed into production,
Starting point is 00:30:34 and then what's the typical remediation use cases? Because I think this is one of the things we've been talking or telling our customers in production you need to think about automated monitoring obviously but then self-healing auto scaling rolling bags like all these self-healing use cases can you tell us a little bit about how captain can actually be told about there's something wrong and what our vision is with captain on how to remediate? Sure. There are a few pieces to that puzzle, actually. So on the one hand, it's a good thing if you build your application already in a way that it can change its behavior to a certain extent if some incident or some event happens. So as an example, if you build out a service that serves dynamic content,
Starting point is 00:31:29 like making a query to a database, calculating some stuff, and sending the result back, maybe there is a failover version of that service that in case the response time gets too high, or you can just switch to a static delivery mode. So just not to get frustrated users that can't use the service anymore. Maybe just turn down the feature set a little bit, but by that being able to serve many more customers.
Starting point is 00:32:01 So writing services in that way is one piece of the puzzle. And we encourage all to write or to use feature flags for for toggling this this behavior. Let's see. So there are frameworks out there for implementing feature flags in your app. There are open source frameworks like Unleash. There are also commercial feature flag frameworks
Starting point is 00:32:27 like LaunchDarkly. So this is the one piece. Then the second piece, you, of course, need a monitoring solution to actually initially find out that there is something wrong with your app. And the way this works is that usually some threshold is hit or a baseline is not met anymore. And again, the event-driven architecture, Captain gets a problem event, problem notification that something went wrong.
Starting point is 00:32:56 So for instance, in our case, it will be Dynatrace detects an issue. And then instead of sending it to, let's say, Slack or an email, it sends it to Captain. Exactly. and then instead of sending it to let's say slack or an email it sends it to captain exactly and captain then forwards that event to a problem channel that in turn services are subscribed to that channel and those services can then act upon that so the the simplest solution would be to have the service receiving the problem event at the end of the channel, toggling the feature flag in the affected service, making sure that static content is delivered,
Starting point is 00:33:33 that the response time goes down and that the users are satisfied. This is the simple version. We actually have two other versions implemented already. One works with Ansible Towers, where you can execute custom runbooks and also parameterized run books automatically to trigger the remediation action and even more sophisticated you can start ServiceNow workflows upon receiving an event a problem event and the crucial part here is to actually check if the remediation action that you've done actually fixes the problem or not. Because just toggling a feature flag and saying, I'm done.
Starting point is 00:34:13 It's not mission control. Yeah, exactly. You wouldn't do it to a rocket where people are in there. I think this fixes it. Push the button and then go for a coffee. So it doesn't work. So that means Captain will then, like with the artifact, where Captain kind of in launch control pushes an artifact through different stages and phases and then can abort it.
Starting point is 00:34:37 The mission kind of at every step with the problem, it's going to be similar, right? There's a problem coming in and then Captain keeps track of kind of the lifetime of a problem. That's kind of be similar, right? There's a problem coming in, and then Captain keeps track of the lifetime of the problem. That's the idea, right? So we can test, we can try different things to mitigate a problem until we finally solved it. So maybe let's look a little bit
Starting point is 00:34:56 in the future of what's to come in Captain also, to that regard. So what we will do, or we will introduce at some point, is something. So we had the perf spec where you have configuration as code where you define what quality criteria needs to be met before you can promote the service to the next stage. We have also been thinking about a heel spec where you can define, OK, if this problem appears, I want to try at first, I don't know, restart the process. Let's see if that fixes it. It doesn't. Well, let's scale it up. Let's add an additional instance of the service and see if that fixes the problem. Still not. Okay. You toggle the feature flag and see if that helps. But there's always like do something and then check if it actually uh fix the problem or not and just
Starting point is 00:35:47 like have a four-step approach of how of what actually a manual operator would also do but this is something that you can automate as well because machines are very good at doing automated stuff also at 2 a.m on saturday morning or sunday morning where humans usually like to sleep. At least I do. It depends on my kids though, but nevertheless. And yeah, so this is the path that we are actually going down. So defining four different remediation actions that are then checked. And the last step of course is usually escalated to a human because the continuous or automated operations worker can't handle it anymore
Starting point is 00:36:25 and can't fix the problem on its own. Pretty cool. So that means that this covers kind of the two major use cases, launch control and mission control. On the one side, getting an artifact safe into production and enforcing the automated quality gates, the automated testing. We talk about blue-green deployments in Kineris, which for me is also part of kind of launch control.
Starting point is 00:36:49 That's when it flips over, right, if you do blue-green. And then the mission controls in case a problem happens later. What can we do to automatically mitigate the problem, self-heal, and then Captain is kind of orchestrating the lifetime of a problem until it's hopefully solved. yeah so that's pretty cool um so i know you said there's a lot of services well we the team has already you know implemented a lot of services and there's more on the list that we are going or your team is going to build now we also from my part of the team we're working with some
Starting point is 00:37:25 of our strategic partners that they also help us in case somebody's not listening to this podcast and they say man i would love to you know also provide a contribute i guess the best way to get in touch with you how if they want to contribute and not just like test it out but really contribute what's the best way of doing it i think the best way is to join the slack workspace and reach out on the general slack channel and just say hey guys i want to do something with captain this is what i want to do what do you think about that and we're we're really responsive in that channel because we we thrive on engagement from the community and we're we're really happy to see on the one hand, the technical partners jumping on the train, the strategic partners jumping on the train. And maybe there
Starting point is 00:38:09 are also customers out there that just want to write their own service for a specific use case that they need to fulfill their continuous delivery or continuous operations pipeline. So it's an open source project. You can contribute. You can write your own service. And we're always open for a conversation about Captain. And I guess you're also hiring, if I hear this right. We do, yes. Yeah, so that means if somebody wants to become a full-time Captain developer. Also reach out to us.
Starting point is 00:38:41 Cool, yeah. And I know some lookouts at the window. It's really beautiful here. I'm not sure where you are looking for people, but I mean, we're in Austria right now. Cool, yeah. And I know some, if I look out at the window, it's really beautiful here. I'm not sure where you are looking for people, but I mean we're in Austria right now. Yeah, yeah. So mainly in Austria, in the Linz area, and in the Carinthia area. Klagenfurt, yeah. Perfect.
Starting point is 00:38:55 But if you're really interested in the project and there is somewhere located on the planet Earth, just reach out to us and let's just have a talk. Yeah, exactly. But why limit to the planet Earth? Yeah, true. I'm so sorry. I mean'll figure it out let's just have a talk yeah exactly but but why limit to the planet earth yeah true i'm so sorry come on that's just so there could be a martian that's been observing us for all this time and being like you know what this is how i was going to communicate with the humans finally i was going to contribute to this open source project
Starting point is 00:39:19 and now they're limiting me man yeah sorry for being so close-minded. Yeah. Jeez. Cool. Hey, let me ask you, can I ask a question about the future of Captain? Of course you can. Thank you, Andy. Obviously, it's developed for Google Cloud right now. And with Knative as a major piece of it,
Starting point is 00:39:42 that makes a lot of sense. I was just doing a little bit of searching on Knative on other clouds, and it seems like you can get Knative to work, but it seems kind of tricky. In the future, do you envision, Captain, being able to run on something like AWS or Azure, or is it dependent on their compatibility with Knative? Basically, what's the future plans for expanding beyond Google?
Starting point is 00:40:07 So we, of course, have to plan to have Captain run on all the major cloud platforms and container platforms. So also on AWS, on Azure, on Cloud Foundry, if you want. So OpenShift, of course. The thing is, we are, and I think this is going to be a huge part of our work also in the upcoming weeks to, to find out, um, if it works properly. So, so pure from a theoretical point of view, it's, it's, it's native Kubernetes based. So it should work on, on everything that is based on Kubernetes.
Starting point is 00:40:45 Then again, we know that in practice that is not true. So there are certain peculiarities for each cloud provider that they've built in. And these are the nitty gritty details that we need to find out and figure out how to solve them or how to actually use those features in a positive way to make Captain work on these platforms. But yes, going forward, we want to have Captain run on all the major platforms that are out there. And by, I would say, by the time this recording actually airs, I would encourage everyone to double check what has happened since the recording, because we are actively working on this right now to really get it to run all of our major platform partners.
Starting point is 00:41:28 So that's why. FRANCESC CAMPOY- Yeah. So either check out the releases on GitHub, or check out the history in the Slack channel, or follow the Twitter handle. So Captain Project is the Twitter handle. Usually, if there is an announcement, like we now run also on AWS or on OpenShift,
Starting point is 00:41:46 this is something we like to communicate actively. Yeah. Awesome. Cool. Well, I know, Dirk, first of all, I can encourage everyone again to look at our performance clinic. You can find it on YouTube and also on Dynatrace University. At the very end, you had a little roadmap slide.
Starting point is 00:42:08 And I know we already covered some of what's coming in the future. But what are the... So now it's time of the recording, June 4th. In six months from now, what are the big highlights that we can then look back and say, wow, this is really cool that
Starting point is 00:42:24 we came that far? What are the highlights from your perspective that are coming in the next couple of months? I think one of the largest highlights that also, again, plays into the user experience, so the experience that Captain users have when they were using Captain to make it way easier to interact with Captain.
Starting point is 00:42:44 So it's easy right now. If you watch the performance clinic, you basically have two files that you need to configure and you need an artifact and you're done. But we want to even even more closely integrate with with Kubernetes. So write our own custom resource definitions and our own Kubernetes operator that that handles uniform and shipyard types. And the reason why we want to do that is we can make update of Captain Core services way easier through the operator pattern that Kubernetes provides.
Starting point is 00:43:17 So if we, for example, patch a security vulnerability in one of our projects, we can simply roll it out pretty simple without you needing to do anything manually. So that's why we are going for the operator pattern. Also, the updating of your tools that you're using. For example, if you want to use Excel release for your deployment tool instead of, I don't know, the tool you were using before, you would basically just, again, say, kubectl apply of your uniform YAML.
Starting point is 00:43:49 And the operator in the background will take care of tearing down the old service, bringing up the new service, and subscribing it to the specific channel. What we're also looking forward for the next six months is actually the cross-platform support that we already talked about running on major platforms. And also, so we've already started with the first ecosystem integrations where third parties
Starting point is 00:44:15 write captain services with Hendrik from Neotis. You've already mentioned that, T-Systems, also about to start. And it's really going to be interesting what the the ecosystem contributions bring over the next six months as we start to fan out uh kept into even more implementations and even without the ecosystem right now you did a great job your team did a great job with the demo apps so anyone that is listening to this or watching the the youtube, go on captain.sh or go to the Captain GitHub page and then just follow the examples on there. Because you wrote, you have a microservice app, I think originally developed by Weaveworks, the Sock Shop app, and you took parts of it out and then just show how Captain is completely automating the orchestration of the deployment and operations. Yeah.
Starting point is 00:45:06 So you have step-by-step use cases that you can work through to actually experience yourself what it's like to do a blue-green deployment, what it's like if a blue-green deployment fails and someone actually tells you, well, your artifact has not met the specified quality requirements. Go back again and try another. And the cool thing kind of is Captain actually makes blue-green deployments boring, which is actually a good thing
Starting point is 00:45:31 because you don't have to worry about all the integrated details in the backend which Istio configuration to change or anything like that because Captain just takes care of it. Yeah, Istio configuration can be complicated, I know. But that is the beauty again of captain because you don't need to you just see in your shipyard file i want to do a blue green deployment that's it and that's it right and it happens yeah all right so andy shall we um go ahead and summon the captain summerator that captain accelerator
Starting point is 00:46:03 so to i think maybe we need to record a new jingle for that. I don't know what that would be. I don't know either. We'll have to get some Captain Crunch music in the background. See how many captains, it's great. So the way I see it is Captain is really
Starting point is 00:46:20 a reference implementation of what we have been preaching for the last couple of years around what we call ACM, Autonomous Cloud Management. That means it's a reference implementation of what we have been preaching for the last couple of years around what we call ACM, Autonomous Cloud Management. That means it's a reference implementation of blue-green deployments, canaries, feature flagging,
Starting point is 00:46:35 automated quality gates in pre-prod, self-healing in prod, basically fully automated in the pipeline, but also operations. And I think that's what Kepton is trying to solve.
Starting point is 00:46:46 Currently targeted towards cloud-native applications that are running on Kubernetes-based platforms. And we are really comfortable as of this recording on GKE. And all the other vendors follow suit because we are currently working with them to get it running as well. I believe what I think the think that the biggest the biggest benefit to as many benefits, but what I really love is the concept of the ship it and the uniform files that you can switch tools or include tools
Starting point is 00:47:14 on the fly without having to change any of your custom pipeline code. You can change your quote unquote pipelines and we call them ship your files on the fly to include a new stage, remove a stage in case you get more mature, or change what happens in the stage. And the only thing you need to do is change a YAML file or a config file and make a call to the captain CLI.
Starting point is 00:47:37 It really makes life much easier for developers to deploy and run their cool apps on cloud-native stacks. And I think this is where, I know there's still a long way to go to get it to where we want it to be, but I hope that people that are listening in right now get encouraged enough to look at it, to start a repo, to download it, to try it out, to give us feedback, to join the Slack channel.
Starting point is 00:48:01 And from that, hopefully we'll have another session like this in a couple of months from now, and then we'll see how far we got. Excellent. Thank you, Andy. I just think this is incredible. I mean, besides the fact that it's a cool tool, what I can't get over is the fact that we're, you know, it's this open source tool that everybody can use. And as you mentioned, you don't even need Dynatrace. It almost, Andy, reminds me of, well, it doesn't even remind me necessarily,
Starting point is 00:48:31 but it draws a parallel to Azure and Microsoft, how we've discussed a bunch of times, you know, the idea you can use a.NET Core running on Java in Azure and you're not paying for their licenses. But in their case, you are paying for the Azure resources, whereas in this case, there is nothing you're paying for. So it just boggles my mind that we're doing all this.
Starting point is 00:48:52 It's awesome. It's the openness. It's the whole idea of look at this cool thing we've done and we want more people to be able to do this and let's get input from everybody and figure out a way to make the world a shiny, happy place, which, you know, as Andy, as we've had met several talks on episodes, I'm always the one who's pessimistic and you're the optimistic one. So it looks like you're winning
Starting point is 00:49:13 in the battle of good and evil. I'll have to come up with some devious plan to take you down next. But it just blows my mind. It's really, really really awesome and I think what you're all doing is amazing so thank you for doing this and hopefully it'll really take off and the technology community at large can be happy with it and thankful and thanks for doing a double header Dirk webcast to podcast
Starting point is 00:49:43 cool stuff thanks for having me doing a double header, Dirk. Webcast to podcast. Cool stuff. Thanks for having me. Well, and Andy. Come on. Don't give me the credit. Andy's the booker. Just holding the mic. Yeah. Little release.
Starting point is 00:49:58 Oh, okay. So we should make this go really long and his arm gets tired. Dirk, any final thoughts? anything you want to you know make sure people uh if there's one thing they take away or any you know anything you want to say hello to your kids something who knows that they are too young to listen to that but um if you if you're involved and engaged in continuous delivery and or continuous operations, please check out what we're doing with Captain on CaptainSH or on our Slack channel.
Starting point is 00:50:31 Reach out to us, talk to us about your use cases and what you're trying to solve. And yeah, let's have a discussion about that. All right. And a reminder to any of the extraterrestrials listening, they are hiring. So you can make your introduction to humanity and the rest of the planet Earth via Captain. So please, we welcome you. Thank you to everybody for listening. And if you have any questions, any topics you would like to be discussed, or if you have something to share and you want to come on the show,
Starting point is 00:51:01 you can send us a tweet at Pure underscore DT or an old-fashioned email at pureperformanceatdynatrace.com. Thank you all for listening so much, and make sure to check out Captain. Thank you. Bye. Bye.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.