PurePerformance - What we have learned about K8s and Open-source when building Keptn

Starting point is 00:00:00 It's time for Pure Performance! Get your stopwatches ready, it's time for Pure Performance with Andy Grabner and Brian Wilson. Hello everybody and welcome to another episode of Pure Performance. My name is Brian Wilson and as always I have my lovely, beautiful and talented co-host Andy Grabner with me today. Andy, how are you doing today? Are you feeling lovely, talented and beautiful? Well, I thought it's handsome for male, isn't it? I mean, I'm not not the native speaker in english but isn't that the way it works i'm i'm stealing english back for for myself i make the rules here i'm an american i make the rules it's okay uh yeah you learned something okay no everything is good here and it's my i think it's my second or third podcast recording now with the new

Starting point is 00:01:01 microphone and you i hope people hear the difference i approve and i'm sure it's a bit of an easier setup for you than having a have those power sources and all that so for anybody who's into the audio stuff andy was using a zoom the h2n zoom microphone which is great for field recording um but as a sort of home base record mic it's a bit cumbersome because it's got a batteries and all the other stuff. So what do you have now anyway? I got the Razor. Razor.

Starting point is 00:01:31 Who makes that? Is that Motorola? I think it's Razor. I don't know. No, I think the brand is Razor. Okay. Whatever it's called. I don't even know what it's called.

Starting point is 00:01:43 I just installed it, plugged it in, and it worked. I'll't even know what it's called. I just installed it and it worked. I'll have to search it later. Yeah. It was sent to me by, thanks and a shout out to our lovely colleagues from the Dynatrace marketing team for recording the Dynatrace Go sessions. We got sent some better equipment, a better camera, and also a better microphone. So that's what I'm using now. Oh, I wonder if that's what I saw Alois the other day. He was actually shopping for microphones.

Starting point is 00:02:12 He was having some issues. And no, no, it wasn't Alois. It was Michael Kopp in our training. And his camera looks phenomenal. I was like, holy cow. I bet you it's because of that. That's awesome. I need to find out what that is,

Starting point is 00:02:26 because that might be nice to have. Anyhow, we're talking about tech gear. Everyone's working from home, so things to do to improve your sound or your video on those remote meetings definitely helps, I think. So I don't think this is a completely wasted conversation as intro. But since we've been wasting time, we have a, I forget the poet, but there was a, oh, captain, my captain. Right? That's an old famous line from an old famous poem that I can't recall. Obviously, English poem, but we're American. But anyway, I'm rambling today, Andy.

Starting point is 00:02:55 Save me. Sure, I'll save you. Well, as you said, normally you scold me for using captain too often in a podcast, but I think today it's allowed to to use it a couple of more times than usual uh we actually got two of the core team members of the captain team with us today and uh the reason why we brought them up well this is something you will hear later but before we get started let's just give them a chance to introduce themselves it's johannes and andreas

Starting point is 00:03:25 so my first question to you very very straightforward who are you guys and you may want to start with maybe johannes is the first and then andreas who are you hi brian and andy many thanks for his invite i'm johannes i'm main maintainer of the captain project and i'm also working um in the innovation lab of dynatrace now for two years. And yeah, it's a lot of fun working here and enjoying every minute I'm in the office and I'm working for the innovation lab. Thank you. Hi, also from my side, I'm Andreas Kramer.

Starting point is 00:03:58 And I'm also working for Dynatrace for now about one and a half years. And I'm also looking forward to share some of my learnings at this time working on an open source project. Perfect. And folks that may have read the abstract to the podcast, you probably saw the similariness between our two names, Andreas Grabner and Andreas Grimmer. So in case you ever write us an email or find us online, it's also odd for us to have names

Starting point is 00:04:31 that are that close. Even the last name is very close. The way we typically kind of get ourselves apart or kind of differentiate ourselves is with the first name pronunciation. So Andreas goes with Andreas and I go with Andy. So in case you ever come across us,

Starting point is 00:04:45 that's the way we make sure we address each other. I thought it would always be if you two are together, you two would have to both salsa dance and then we'd be able to tell which one's Andy, which one's Andreas.

Starting point is 00:04:55 Because Andreas, I'm just assuming you're not a salsa dancer like Andy. And if you are, you're not the same caliber. But anyway. That's true. And hey, I just want to point out, you two have been talking about how long you've been with Dynatrace.

Starting point is 00:05:09 And the only reason I bring it up is because it's the end of September. Andy Grabner, this marks nine years for me, which I think is about, you're a little longer than that, I think, right? I got 12 and a half, yeah. About the same. Anyhow, and I love working here too. It's awesome. Anyhow, I just wanted, yeah, yeah. Anyhow, and I love working here too. It's awesome. Anyhow, I just wanted to point that out. So for the audience,

Starting point is 00:05:29 the reason why we got these two guys on board is obviously we will talk a little bit about Captain, but the initial thought was to really talk with a group that has been in the Kubernetes space now with a project, with a service, with an application for a year and a half. And we really wanted to talk a lot about the learnings. What have you learned? What have you done in the beginning? What made you make certain decisions? Which decisions would you have made

Starting point is 00:05:55 differently? Looking back, because we know a lot of organizations are investing in Kubernetes, they're starting to build applications. And I think we see there's a lot of learnings, and I think we have our learnings, and that's why we want to share those. And Andy, I'm interrupting you all the time today. I think this also ties back to, there was a podcast we did a while ago about lessons learned or things to learn

Starting point is 00:06:21 and consider when trying to go for open source. And I think reading over the notes of this, some of this can apply for that as well. Cause obviously besides the work was doing with captain and all those considerations, seeing some of the challenges that were have gone into this, that we'll reveal during the episode of considerations of the whole open source things and how sometimes you do have to change things and you're like,

Starting point is 00:06:44 okay, we got this far, but now we know we got to go back and make a major change in order to keep it going forward. Keep in mind for anyone working on open source project, there's some other good notes in here for that as well, I believe. Yeah, but maybe Johannes, as you introduced yourself as kind of one of the core maintainers, can you kind of take us back a year and a half ago about when this all started? How did you start and why Kubernetes and what are the initial lessons learned? Absolutely.

Starting point is 00:07:12 Because I think it's important to understand how everything started so that we can also then better explain our learnings that we had along our journey. And for answering this question, I think we have to go back in time because it happened two years ago that we in the innovation lab,

Starting point is 00:07:32 we had the goal to bootstrap a workshop. A workshop that helps our customers and partners to deal with the complexity and with the challenges that come with the cloud. As you all might know, Dynatrace is a SaaS company and of our own journey to move workload from on-prem to the cloud, we also learned a lot of things. We developed best practices, we applied certain strategies and learnings, and those we also wanted to deliver to customers. For example, we collected all the best practices around automating the monitoring about performance

Starting point is 00:08:17 testing, then also setting up quality standards for your delivery and also applying deployment strategies. And this we all bundled into a workshop. And then we delivered this workshop to participants that were joining a workshop. And the outcome was always great because at the end of such a workshop, the participants, they were really amazed by what we showed them because we had hands-on labs, then small demos, and they could really feel how it works to deal with cloud tasks. And at the end, they all were very motivated to also integrate those components into their own organization. But after a couple of weeks, they came then back to us and said, well, we wanted to get

Starting point is 00:09:13 started. We wanted to set up performance testing. We wanted to set up quality gates in our pipelines, but we are missing a very central component, a component that helps us to get started. And this was then the kickoff of the Captain Project because with the Captain Project, we announced or we still are working on a framework for automating continuous deployment tasks

Starting point is 00:09:40 and also going beyond deployment and helping to automate operations. And this was then the kickoff. This happened in January 2019. And since then, we are constantly improving and evolving the Captain project. This was the intro and then what happened two years ago. Yeah. And I think it's great that you bring us back because I also remember, I think it was in

Starting point is 00:10:09 the, was it in the hotel? What was it called again? Next to our office. Donavelin, right? I think that's what. Yeah, in Donavelin. I think it was in Donavelin, if I remember. Right.

Starting point is 00:10:20 Yeah. And we had all the folks together and we were basically using Jenkins for automating a couple of tasks. Then we were, as you said, putting things together and then said, well, this works great with our sample app. And now we know the best practices and how it can look like. And you're completely right. Then basically, we said, now let's implement this. And we said, yes, it worked well for the demo app. But how can we get this on the

Starting point is 00:10:45 road for real right how can we build something that actually works for everyone out there not just for the sample lab that we picked yeah true yeah cool um the and and uh so the next question that i have and i think this is also important to understand, because as you said, Johannes, in order for the listeners to understand why we made certain decisions or why you made certain decisions, it's also important to understand what the product, what the tool is all about. Maybe Andreas, kicking it over to you. While Johannes has already addressed or highlighted, kind of covered a little bit of the use cases, can you go little bit deeper in into what the use cases are that you guys are addressing sure yeah so as johannes explained captain addresses two use cases namely the deployment the testing and evaluation of artifacts as well as their operation tasks and automating this. So Ketten now provides you a framework in order to implement this.

Starting point is 00:11:48 And unfortunately, you do not have to start from scratch here. Instead, Ketten gives you lots of best practices which are already built in. For example, one of our best practices is the so-called quality gate. And this quality gate allows you to evaluate the quality of your microservices in a fully automatic way. So therefore, we are using well-known concepts

Starting point is 00:12:16 like service level indicators and service level objectives. And the service level indicators can now be any metric which you get from your monitoring solution like promisos dynatrace you name it and using the service level indicators you can then formulate service level objectives and basically using these objectives we can then build a quality gate so this is really now one best practice, which is built in into Ketten. And then we can use this result of the quality gate, for example, in order to decide whether an artifact should be promoted into production or not. Very cool. And Andreas, thanks for that explanation.

Starting point is 00:13:03 I think the I mean, obviously, I mean, for me, this should be nothing new because we're working very closely on this. And the concept of of the SLI providers of of of pulling data in from different data sources, I think is also what attracts a lot of people to Captain because Captain is an open source project is agnostic to the underlying data source. And I really like the fact that you kind of architecture it in a way that you can easily replace components such as the data source. And I want to just also do a quick shout out for people that want to learn how to build their own SLI data source. I'm giving a talk at the NeoTIS PAC event that happens early October. By the time this show airs,

Starting point is 00:13:50 I think it's probably around the same time. But anyway, the recording should be out there. I'm actually going through the use case on how to write your own SLI provider because we have seen users out there asking, so how can I pull in data from Splunk? How can I pull it in, I think, Wavefront? Intuit is building an integration there.

Starting point is 00:14:12 How can I pull data in from other APM tools? And so I want to show how this works. Andy Grabner, can I ask you a question about that? Of course. I think it's all related. So one of the things i've noticed in captain and if again i have very limited use of it but usually we're saying a data source pull from dynatrace now captain can pull from multiple data sources right so so the idea the benefit of

Starting point is 00:14:37 what you're describing is that you can pull some metrics all within the same test you can pull you'll be able to pull metrics from multiple data sources to do your evaluation. Is that true? So right now, and guys, correct me if I'm wrong, but right now we have a kind of one-to-one relationship from a captain project to a data source. So that means you can set up different projects and then have the, you know, for every project you have different data sources. You can say, I'm running one for my monitoring data and then one for my testing data.

Starting point is 00:15:09 So this is one way of doing it. The plan is to support the future multiple SLI providers for the same project. But for this, I don't want to go too far out and promise too many things. This is where I hand it back to Andreas and Johannes to see where we're standing with these multiple SLI providers. Yeah, you're absolutely right, Andy. For pulling in data from different sources, you have to split those based on projects. That is what you can use as of today. But plans are going forward and we want to also include other sources

Starting point is 00:15:47 into the same testing process and evaluation process. And I think the reason, and again, Johannes, correct me if I'm wrong, why we've been kind of dragging out these capabilities because what we have seen is that a lot of organizations do have a data source where a lot of data ends up anyway. So with our Dynatrace customers, they're pushing all the data into Dynatrace first, and then you can get this data out. Like we were doing a lot of work with the Neotis folks.

Starting point is 00:16:16 They have their own SLI provider, yet they're also pushing data to Dynatrace. So that means through the Dynatrace SLI provider, you can get all this data. Or Sumit from Intuit, right? so that means through the diameters it's a live provider you can get out all this data or assume it from from intuit right they are pushing all the data into weaveworks uh away so away front and then they're pulling the data from there uh this is why we we've we've also kind of dragged out the decision to to focus on this feature earlier and focus on other things first um it does make sense too because if you if the idea of putting all of your data in one source makes a lot of sense for maintenance and cleanliness and all that. So yeah, good points.

Starting point is 00:16:52 I was just curious there. Thanks. Yeah. So I've got a follow-up question here, and I think this also brings us even more closer to some of the decisions you made. But we talked, and Andreas explained the use cases, explained SLIs and SLOs I talked about the openness how does Captain actually work then maybe this is a question for Johannes probably how does how does it work how can you actually and I think this is what we're promoting separate

Starting point is 00:17:17 the concerns between process definition tool definition how does this all work internally and why is it so flexible that you can actually extend captain as you like yeah that's a really good question because um yeah in its core captain is an event based or follows an event-based architecture this means that captain itself receives events from the outside then um yeah it analyzes this event and then kicks off a process, a process like the delivery or like remediation. And then Captain takes care of orchestrating all these events that then are also sent out by Captain and received by Captain.

Starting point is 00:18:01 And this is a way that we or that other tools can be plugged in because they just have to listen to certain events. And whenever they receive it, they do their job and then they just respond with a finished event. And this is then the trigger for Captain to go on and to do the next orchestration step. I mean, I always bring the example of the continuous delivery pipeline when I explain the event-based approach of Captain because, as you know, continuous delivery has always testing in mind. And when you kick off delivery process,

Starting point is 00:18:40 then Captain will at some point send out a test point triggered event. This will then be picked up by a tool like Gmeter, Selenium, or any other testing tool. Then the tool does its job, and when it's done, it just responds with the test.finished. And finally, your captain then goes on and executes or takes care of executing the next step in the delivery process. And bringing this flexibility allows customers to also plug in their tools they have already in their organization and in their operational or development processes. And yeah, this is great flexibility that comes with captain and maybe just one additional thought for for understanding how captain is working from a technical perspective we

Starting point is 00:19:34 split it captain into two components one we call the control plane which is taking care of all the eventing mechanism and the other one is the execution plane that is then responsible for executing the tasks like testing, like the quality evaluation, like doing a remediation action. And as of today, both planes are running on Kubernetes, but we are planning and already implementing features to also let the execution plane run outside of Kubernetes. But yeah, as of today, Kubernetes is the underlying platform that Captain is running on. Very cool.

Starting point is 00:20:21 Hey, Johannes, I got a question here. So the way, or maybe a confirmation, because the way I always try to explain it, basically what you just explained to me is we bring a concept into continuous delivery that we have seen from architects that are designing modern applications that means you have loosely coupled components you call them the uh the execution plane right you lose a couple services and they all talk to each other through events and there's obviously one component that makes sure that the right events are sent at the right time it's like a business process engine that is executing that is that is uh orchestrating a process by sending the right events at the right time and then you have you know loosely couple things that can also be replaced very easily that can then actually do a task is this place understand absolutely correct yeah absolutely correct uh also important to know or to understand is that Captain follows the principle of separating the process, which can be a delivery process or

Starting point is 00:21:28 a remediation process, from the actual tooling so that you can easily plug in different tools depending on your ecosystem and also on the tools that the cloud provider offers you.

Starting point is 00:21:48 Very cool. Now, so i think we talked a lot about you know the use cases how captain is working so i think by now if if listeners if you're still here you should hopefully understand what captain does uh and if not then we really need to figure out what we do wrong with explaining it there's a lot of other material out there too uh if you want to catch up there's videos there's tutorials you can go to tutorials captain.sh and as we said with a youtube channel with a lot of additional information but now let's really get into things that like lessons learned i want to really understand how how did we get from where we started after we left that room in that hotel from where we had a demo environment to where we are now like where did how do we get started what did we you know because

Starting point is 00:22:33 it's it's a big problem that we always they had to solve and i guess we had different we call it baustellen you know in german we had different construction sites different things, we had different construction sites, different things that we had to attack. So maybe Andreas, I think it's time for you again. What are the steps that we take back then and lessons learned? Yeah, I can confirm that this is definitely a long way from being a demo until there is a VR project. So I especially like now to explain you our installer because this installer always really nicely describes the maturity of the project of CAT.

Starting point is 00:23:18 So we started, of course, with a bunch of shell scripts as everyone does. So this shell script has been executed by the users locally. And as you can imagine, we run in lots of problems. So we had lots of dependencies on external tools at specific versions and so on. And if such a tool wasn't available, our daemon did not work. So basically, this was the first generation of Ketten 0.1, a set of bunch or a bunch of shell scripts.

Starting point is 00:23:52 In the second generation, we thought this is no way we have to do something against it. So we containerized basically the complete installer. So we got a Docker image, which was then executed by a Kubernetes job. And this Kubernetes job basically sets up our complete demo environment. And this fully automatic installation was great for doing demos because we automatically installed Helm with Tilda because Helm version 3 was not available. We installed Istio, we installed Knative,

Starting point is 00:24:35 and we even set up some virtual services and gateways in order to access our UI and also the API. Yeah, but as I said, we had lots of dependencies. And you can imagine not every Ketten user really would like, for example, to have Istio installed. So this kicked basically off our third generation. And here we basically did a radical diet of the installer. So we sat down, did a re-evaluation of our dependencies. And for example, let's pick now Knative.

Starting point is 00:25:17 So in the first place, Knative looked like a really nice match to Ketten because Knative eventing was the perfect tool in order to manage, subscribe and also deliver our Ketten events. Also, Kinective serving was really cool because this would even allow us to scale our services down to zero and to save resources. However, we found out that Kinective was definitely not the right choice for us. First of all, it took so much resources that we, for example, needed 16 virtual CPUs for only doing a demo setup. And honestly, Knative was also too instable at this time because

Starting point is 00:26:07 we always make the joke that we spend much more time debugging Knative than Ketten itself. So my learning here is definitely do an in-depth research of your dependencies and especially

Starting point is 00:26:24 if you're using dependencies which have a version below 1.0. So really be careful here. And I can continue now the story. So we even were able to remove

Starting point is 00:26:39 Istio. We were able to remove Tiller because Helm 3 was there. And basically we resulted in an installer which only consists of a single Helm chat. And this Helm chat now allows the user to configure, for example, the services, how the API is exposed. And this was a great move, which we did last year. So Andreas, a quick question here, a recap, because I think you just said a lot of interesting things

Starting point is 00:27:14 that I also vividly remembered as we went through that kind of progress of maturing. A lot of people will run into the same thing. So I like the installer story because the first time we tried to install everything through shell script from the outside and basically installing things, probably using kubectl commands that we pushed, that we basically put together in shell scripts, right?

Starting point is 00:27:42 It was all kubectl and doing this and this. And it was like probably very lengthy shell scripts that's right the next thing we instead of doing it from the outside we baked it into a container and then let that container run from within the kubernetes cluster to avoid you know obviously problems with with uh being not able to execute shell scripts correctly. And then we basically put this into an installer job. But then the other thing that I thought was very fantastic, and I want to make sure that people don't think we are bashing on projects that are not in version 1.0 yet, because remember, guys,

Starting point is 00:28:19 Kepton is also still in 0.0, in 0.something. But there's obviously a price, quote unquote, to pay if you are jumping on a new technology or framework early on. And you are playing also, you know, a tester in most of the cases. And that's what we did with Knative. And I'm pretty sure Knative now has matured and is a great, great framework for use cases where it really makes sense. Right. I think that's, that's what I wanted to say. And, um,

Starting point is 00:28:51 Yeah, I added to that Andy, cause I used to play with Captain in some of the earlier days. And it's funny that you mentioned the bit with Tiller and Knative, cause I remember going through some of the exercises and especially once I got to the Tiller part, there would always be some error going through and I'd start looking up what my environment had wrong, if there was a permission.

Starting point is 00:29:11 And I remember in the earlier days thinking like, how is this gonna be usable if this is happening? And you're describing exactly, you just described exactly what happened to me is what you all discovered and said, well, it's because of these dependencies, let's get rid of these dependencies. And I think that just is a very strong action to take to say,

Starting point is 00:29:32 this is just not going to work this way. Let's, let's change it now while we can. And you have to make those decisions, which sometimes can be tough because you look at, you know, I don't know how much work was involved in that decision, but you'll come to the point where you say, either we're going to keep having these kinds of issues, or we put in the work now and save on it later. So kudos to that. Alright, I'm sorry

Starting point is 00:30:00 that I interrupted you, but I think you wanted to go on with some other stuff. No, I think the installer really shows how to get or how we got from a demo setup to a project setup. So there are so many steps which you have to take to evaluate your dependencies and also of course you need to get your features made sure i want to then then touch on one more thing though because i know in which state we are right now so as of the time of the recording uh we have version 0.71 released. You just mentioned we provide the Helm option to install Captain for people, especially that I think we have a couple of customers or users that are using the DyniACAP systems where they want to first download all the images and then they want to use Helm to deploy it after they've added and analyzed all the images.

Starting point is 00:31:07 I think that's great. But I also want to bring up one thing and I want to get your kind of public opinion on this. We've reduced complexity on our end, as you said, by only installing things that are really necessary to Captain. We don't install Istio anymore by default. We don't configure virtual services. So we basically leave this to the end user depending on how they want to use Captain because we assume if they want to do blue-green deployments with Captain, then they may already have Istio and we use it.

Starting point is 00:31:41 They can decide how to expose Captain to the outside world because they already have an ingress. Now, the point that I want to make here, while we made all these steps, we assumed, or we kind of pushed the responsibility of the things around Captain, everything that is necessary for something to run in Kubernetes and to be exposed to the outside world. We kind of pushed this away from us.

Starting point is 00:32:06 And we assumed that our users know about these things and they can provide these things and they know how to work. But the reality, at least for me, it seems that we see a lot of people just getting started with Kubernetes. And now they're struggling also with these, let's say, quote unquote, basic concepts that we have assumed people know. I think I wanted to hear your thoughts on this. If maybe I just see this because I work with a couple of users, or if you also see this. Yes.

Starting point is 00:32:44 I would say we have sharpened the focus of Ketten. So Ketten shouldn't be the tool, for example, which manages your certificates. So yes, I completely agree that this challenge is now pushed basically to our users. But I also think that this is the right place where this should live. So in the management of the certificates. Before we've used, for example, Istio in order to expose services. And as you can imagine, a framework like Istio also brings lots of complexity into it. So going back and using Kubernetes primitives like services with the load balancer, node port,

Starting point is 00:33:31 or using a Kubernetes ingress is sometimes easier than using Istio with gateways and virtual services. So, yes, but of course, we also provide some quick starts how to set your production environment up with Ketten. So I wouldn't say that the user is now alone. So there is always help from our side.

Starting point is 00:34:00 But maybe this is a little bit more in the documentation now. Yeah. Can I bring in one learning from my end? Because it also relates to this question or to this discussion because I think you can never make assumptions

Starting point is 00:34:17 about the environment you will be deployed on. This was also one thing that we had in our first version of the installer or also in the second generation. We made too many assumptions about the environment that we deploy Captain on. And we also learned that this is not true and this is not the case that we can assume this or that. It always depends on the user and also on the setup the user has been available. Because there is, for example, the OpenShift group.

Starting point is 00:34:52 Then there is also the group that has an air-gapped system where you have no internet connectivity to the outside world. And then there is the group of the users that have access to a regular Kubernetes deployment like in a G Cloud or AWS. And there is that much difference between those groups that you never can make any assumption

Starting point is 00:35:17 about the environment. Yeah, I think that's a really, a really good point. Never make assumption in which environment you end up running. And also the, um, I think one additional point that I want to make, because you know, a year to be honest with everybody out there, to be frank, I had no clue about Kubernetes a year ago, and I'm still struggling with all the basic concepts because I simply never, I was never trained on even though i have a basic understanding of networking but i'm completely blank when it comes to you know routing and and certificates

Starting point is 00:35:52 this is just something that is definitely not my stronghold uh but now if i want to if i if i i want to go towards kubernetes and whether this is Keptn or anything else, I believe the big lessons learned for me as a consumer, as a user of the Kubernetes ecosystem is that I need to be aware of these things because I'm all of a sudden, I think, responsible for it unless I work for an organization where they provide Kubernetes as a service to me. But then most likely this Kubernetes is so locked down or so special that it's again hard to get specific software in so i i think the lesson learned for me is learn learn learn you need to understand kubernetes and that also includes networking routing security i think we all need to have a basic understanding because otherwise it does a lot of struggle on any side here.

Starting point is 00:36:49 All right. That was my little thing, my advice and lessons learned from my side. You sounded like me there for a moment, Andy. I know, I know, I know. Right? No, but I'm completely, I mean, Kubernetes is amazing and I think it opens so many doors, but I think we also need to understand that. Just Brian, I think we both come from the Windows background

Starting point is 00:37:12 just because I know how to launch Windows 10 and know how to open PowerShell doesn't mean I know how to properly operate a Kubernetes cluster. I installed Windows 2000 from 3.5-inch disk, so that puts me in another class. So I think, guys, great lessons learned. And I know there's a lot of different venues, as you said, there's a lot of stuff in the documentation where we help guide people through things that they need to know and need to install.

Starting point is 00:37:48 Now, shifting a little bit to a different area, there was a strategic decision, I assume, to make an open source project. And I was wondering how does this all work out with an open source project? Is this more beneficial in the end than... Is it worth the effort going down the open source route? Is it slowing you down because there's certain things and processes in place? What are the lessons learned from actually running an open source project?

Starting point is 00:38:23 And in this particular case, a CNCF sandbox project. I can get started. I think the first really cool thing is that the community of Captain not only exists of the core developers, because there are also the users out there. The users that try out Captain, that find the bug and report the bug, and they also bring in new ideas for features and enhancements and this is really what makes working for an open source project fun and very very cool i think this is one one learning from my end bringing the user the actual user closer to the product is definitely a positive aspect of running it as an open source project.

Starting point is 00:39:10 Very cool. Any other thoughts? Andreas maybe from an open source perspective? So Ketten is intentionally designed to be an open system which allows custom integrations and a few minutes ago we talked about

Starting point is 00:39:28 sli providers so we have one for dynatrace we have one for promisos but now the community really the community of captain is building an sli provider for vfront and other tools. And this really allows us to scale Kevin. Otherwise, when we wouldn't be an open source project, it would simply be impossible to do every SLI provider on our own. So

Starting point is 00:39:57 I see great benefits in providing a great ecosystem which addresses then lots of use cases. Yeah, I also remember I had actually a call earlier this morning with a user and he was struggling with a Dynatrace SLI integration that actually where I wrote the code for.

Starting point is 00:40:19 And he said, you know what, I pinged you earlier because I wanted to get some help. And we went on the call and then he said, you know what, I just looked up the source code because it's available anyway. And so I found my way around. I understand now how it works. So I think this is also great of having this open out there, open and flat out there. And people can just, if they're not afraid of code or looking at other people's code, also help themselves or build integrations, as you said.

Starting point is 00:40:47 Yeah. I totally agree. And I had the same story in my mind that really users are pointing you to a source code, a line of source code that is containing a bug. And then it's really an easy game to fix the bug and to deliver a new feature and this would not be possible in a closed source product and yeah this this helps and engaging with the users and also the users to engage with the with the captain team it's a thing i think it's a win-win situation for both the users and the developers of captain so great stories but i didn't know that

Starting point is 00:41:29 there are bugs in our source code uh they i think uh we have a colleague who calls them who calls them opportunities right a lot of opportunities in our source code to make it better exactly wow that's like that's some serious hr type action there i know it's called an opportunity it's no it's an opportunity wow you're gonna have to remember that one yeah hey um i i know it's you know we've been doing this for you guys have been doing this for a year and a half and we've been since when since when do you remind me since when have you been engaged with cncf engaged with cncf uh we kicked off the the the start in in august 2019

Starting point is 00:42:19 at there we yeah there we started to write our proposal for CNCF. It was in August last year. Perfect. And then I think earlier this summer or maybe in the spring, we got officially accepted as a sandbox project, I believe. True. Yeah. Cool.

Starting point is 00:42:41 And now we are on the road to what's the next step? Incubation, i think is the next yeah um now one question and this i think brian we with whom did we discuss open source projects and uh kind of how to run them and lessons learned we had uh that's a good question she was from google i believe yeah i think she was from Google and she talked about open source projects. Now, I know that both of you and also the rest of the team is not only active in our own Captain community, but I believe, Andreas, you are also a part of the CDF, right? You're actively an active member of the Continuous Delivery Foundation.

Starting point is 00:43:24 How does that go right this is a similar organization like cncf and i'm also a member of a special interest group on interoperability and this is now a perfect match of ketton because ketton really tries to address this interoperability problems using events. And therefore, I'm here a member in this group. And for example, we are currently working on a white paper, which really states the advantages of using event-based systems also for continuous delivery. Oh, that's cool. advantages of using event based systems also for continuous delivery. Oh, that's cool. Yeah.

Starting point is 00:44:09 And I think the reason why I wanted to bring it up, I believe this is also very important that when you are running an open source project and you hope for external contributors, which obviously is one of our goals in order to grow an ecosystem, you need to have external contributors. But I think you also need to give back and kind of contribute to other communities. And this is why I wanted to highlight that you are part of the CDF. And that has obviously benefited the CDF. It has also already benefited us because they gave us a podcast. We're now speaking at the conference and we are in constant exchange.

Starting point is 00:44:44 So I think as a best practice for everybody out there that wants to start, don't just think about your own open source project, but if you want to make it big, also contribute to others because in the end, it's a global community and we need to cross-pollinate or whatever you want to call this. I'm not sure what the right word is for that. One other topic, a quick coming, staying on this with how to grow

Starting point is 00:45:13 an open source project. We do have open or external contributors, right? There's a couple of people that have contributed already. True, yeah. We have a couple of people that have contributed already true yep we have a couple of folks out there that provide contributions almost on a weekly basis and um i know there was uh and maybe you can uh imre i think he was he was great because i think his story and maybe you can even tell is better than i can, the way he started with Captain and why.

Starting point is 00:45:49 Yeah, he took a look at our issues and we have a couple of those marked as good first issues. Those are issues that we consider as good, yeah, as candidates for getting started and for getting involved into the dev process of Captain. And he picked one of those. He assigned himself to the issue, got it implemented, and then he filed a PR

Starting point is 00:46:14 and opened a PR against the Captain Core code base. And we approved it. And then he was in. He was then, yeah yeah this was his first contribution and he then continued to provide other features as well yeah that's cool because i remember i had a conversation with him and i think he said he wanted to get into the open source space he wanted to contribute but then he looked at kubernetes as a project and he said this it's so huge and and he doesn't he didn't really know and should he contribute a i don't know a readme change but then he said no he wants to actually contribute some code so he was actually looking

Starting point is 00:46:55 in a space that is interesting for him which is the space we are in with captain but he also picked a a smaller project where it was it was easier to contribute also code. And I think he was really helpful. He was really happy with the way we kind of onboarded him on the community, the way he felt. So that was great to see. I think we exchanged some thoughts on the podcast that I recorded with him because he's also running a podcast in Indonesia where he's from. That was pretty cool. Andreas and Johannes, did we, to kind of round up the open source project discussion,

Starting point is 00:47:33 is there anything else that we have learned over the last year or so since we have been kind of, you know, pushing this open source project, especially around CNCF? Are there any things that people that want to actually start maybe their own cncf project that they should be aware of things that we didn't anticipate in the beginning you know some challenges some things they require from us just hurdles or maybe even even things that you know people should know that these are things that they will have to do if they want to become

Starting point is 00:48:07 a CNCF project. That's one thing that also we had to learn. It's all about documentation first. Well, you need to describe what you want to implement by your issue. You also need to describe

Starting point is 00:48:23 what's on the roadmap and you need to provide documentation to make it even possible for someone to contribute. And when I think back, one year ago, our issues, there was a one-liner in there just explaining what needs to be done. And now they are really nice framed

Starting point is 00:48:42 where we have an explanation, what's the problem, what's the task, what's a different definition of done. And this then allows other people and also contributors to understand what needs to be implemented. Long story short, documentation first is definitely a learning that you have to follow or you have to apply when it comes to an open source project. Anything else from you, Andreas?

Starting point is 00:49:08 Anything that is good to know if you go down the cloud of an open source project that you learned? So I only can confirm, Johannes, because you really have to learn that every communication is asynchronously. So it would be best to don't talk to each other, instead to write your thoughts on the GitHub issue. We all know that this is lots of work, and sometimes it's easier to discuss this directly. But otherwise, when you write it down, it really gets transparent for everybody and

Starting point is 00:49:48 then you can really see how decisions are made and you can influence decisions which is really interesting and this was definitely a learning from my side and that we have to keep the documentation up to date

Starting point is 00:50:04 and keep transparency. So that means what you just described is a new opportunity for a new open source project that is transcribing Zoom conversations or whatever thing and then automatically putting comments on pull requests or issues with the right username attached to it. I mean, that would be awesome, wouldn't it? Because then you can have a conversation like we have, but everything is fully documented in Git. Yeah, but then you've got to watch what you say.

Starting point is 00:50:35 It's true. That's stupid. Andy wanted me to put this thing in here now. Oops, I meant Grim or not Grim. Yeah. No, but that's it. It's an interesting point, too, because for so long, there's been the idea that you have to get all

Starting point is 00:50:56 developers together in a room, right? That working, I think especially pre-COVID, there was this idea that working remote in certain situations, especially with development teams, can be detrimental because they can't just get together and whiteboard things and hash things out. And obviously, we're seeing that's not so much the case if you have good people in COVID. but I think that almost takes the idea of it all going like it's taking it one step further where the lesson from the distributed team that you're working with in an open source project is to say no don't have a conversation with them go even further away from getting in the room and just type it all out which it sounds so counterintuitive or at least it's so

Starting point is 00:51:42 different from what was being said a year ago or two years ago in terms of everyone has to be in the same room and hash out ideas. That putting it into words, typing it out there, making it visible for all to see, and sort of maybe not necessarily slowing down the process, but removing from the process the idea of communication, one-to-one direct communication, is really interesting. And I don't know how to account for that, if it's applicable more widespread than just an open-source kind of GitHub situation

Starting point is 00:52:18 or not. So it's kind of opening a new idea here. Yeah. No, I think you're right. I mean, if you are, first of all, you take, even though there are obviously emojis and exclamation points, but typically you probably take the emotion out of conversations

Starting point is 00:52:41 and therefore really have to think of what you want to write and in a way that people understand. And I think think i'm sure we've all been in that situation if you want to say something right now but then you kind of take a step back and wait a minute or two and then start writing it down then the stuff that you write down actually makes often more sense and is clearer so i like it but it's obviously a big change change to the way we as humans have done collaboration together, especially in times when we've all been in offices. Yeah. I think also there's this idea everyone falls in love with, and I have no idea how true it is, but there would seem that we would have a lot more opportunities for those aha, those eureka moments. As opposed to if I'm calmly writing down something in an email, then you're reading it, thinking about reacting to it.

Starting point is 00:53:38 There doesn't, that chemical interaction that occurs when people are in the same room. You know, I think our bodies even just undergo a change when you are in the same room you know i think our bodies even just undergo a change when you're in the same room and who knows if that process makes the thinking process different uh do breakthroughs not happen or are they more likely to happen that's a research project for somebody out there are there or breakthroughs more likely to happen if people are reading well-written concise things that they can ponder on and think on before writing or how much does one-to-one interaction in presence of others how many times does that actually spawn a useful eureka or aha moment so if someone's looking for a thesis paper go ahead and run with that one hey brian i i don't want to do a summer writer today because i think

Starting point is 00:54:23 i think i think maybe the Summarator is going on retirement For a while It seems so Because guys typically I'll try to summarize the things that I've learned But I think I always kind of Recapped on what you guys have been saying In different sections of the podcast

Starting point is 00:54:38 Agile recap Agile recap Exactly but The thing that I want to ask each of you individually, and I want to start with Johannes, looking back one and a half years, if you would start over now from scratch where you were, where we were when we left that room with all the knowledge you have about kubernetes about open

Starting point is 00:55:06 source projects um would you do what would you do differently now how would you would you change the architecture again would you pick different frameworks for certain things would you still pick kubernetes or pick something? Or any other thing that, one thing that comes to mind that you would change and would do differently? Johannes. That's an excellent question. First of all, I think I would still use Kubernetes

Starting point is 00:55:38 as the container orchestration framework for deploying and running our framework. But one thing that I would now do different is that I would re-evaluate each and every dependency you bring in. Because each dependency you need to update. They have security vulnerabilities you need to be aware of and also force an update. And at the end, each dependency also has a customer impact. And therefore, you really need to reconsider what you bring in into your product.

Starting point is 00:56:16 And maybe from a not technical point of view, but more from a product point of view, I mean, we had great possibilities to talk to customers in an early stage, but I would also bring in customers into the kind of brainstorming and decision-making phase right at the beginning. Because at the end, you want to solve a customer problem or a user problem. And those problems must be written down on a whiteboard right at the beginning when you start working on a project that solves a problem. Thank you for that. Andreas, anything from your end? Yeah, I would also go with Kubernetes, but probably some types we are using.

Starting point is 00:57:12 So some resources, for example, our shipyard file could be, for example, be a custom resource in Kubernetes. So using operators in order to control this lifecycle of these resources. Maybe this would be a good design decision and maybe we will do it. But we, of course, had the decision to not make us dependent from Kubernetes. So Kepton should run anywhere. And that was the reason why we stick to this architecture. And there are lots of small issues.

Starting point is 00:57:53 For example, introducing health checks, resource limits, rule-based access controls. You all have to do this from the beginning. Basically, do the design, the security in mind from the first place. This is really a learning. And if we would now restart again, I would definitely would have this more considered. That's a very good advice.

Starting point is 00:58:25 I think both of you, if I now want to do a little highlight of what the both of you just said is always figure out what is the real problem is that you really want to solve by talking with people that actually have the problem and doing this in a large enough group that you know you're actually solving a problem, not for one individual, but for a larger group of people for a product. Not that we haven haven't done it but we should have included even more people up front and then if you stick to kubernetes which both of you agree on was a good decision is always have security the security aspect in mind which i know is is basically the earlier you address it the easier it will become

Starting point is 00:59:06 and you don't have all the technical depth, security depth, whatever you want to call it, to deal with later on. I think that's great advice from both of you. Brian, what have you learned today? Well, I learned a lot about Kubernetes that I hadn't thought of before. I learned about the history, which was really awesome. And I think what our listeners can learn is that, so

Starting point is 00:59:29 Andreas and Johannes, I challenged our listeners somewhat to a drinking game anytime Andy would mention Captain, because invariably at the end of every episode, he'd be like, oh, and Captain, plugging away because he loves it, you know, but I just thought it was humorous. Now, though, we've done a couple of episodes on Captain

Starting point is 00:59:48 throughout the last year or so. I think this is a really nice in-depth one. I think what I learned the best is that whenever you're starting a project, don't be dogmatic about continuing to use what you started with. You have to be flexible and be willing to change. Look at, you know, and I think this goes for any development project.

Starting point is 01:00:11 I think this goes for anything in life, really, right? What is the goal that you're trying to achieve and where are you trying to get to? And you're going to start out by picking some tools, some frameworks from some anything to accomplish that. If you're going to be successful, though, you always have to ask yourself, what is the actual goal here? And is this approach and what I'm using helping me get there? And if not, abandon it now and find something else that'll

Starting point is 01:00:37 get you forward. The point is not to use Knative. The point is not to use necessarily even Kubernetes. The point is to get to an end state. And if you focus on that, the rest will fill itself in and you might have some trials that you have to go through to find the right things. But that should be the goal. Even speaking of our customers, their customers are the goal.

Starting point is 01:01:01 The goal is not to get into a microservices, containerized, highly scalable platform. The goal is to develop or to deliver the best experience for your customers. And if that means microservices and Kubernetes, then great. If the monolith is doing it perfectly fine and you can improve upon that, great.

Starting point is 01:01:23 So always keep that end goal in mind and don't be inflexible. That's my takeaways. Very cool. Hey guys, thank you so much for, for getting on the show. I know, I think it was the first time for both of you and it's, those are new kind of just talking into a microphone is not always that easy, but I think you, you were perfect guests

Starting point is 01:01:45 because you have a lot of experience that you built up over the last year and a half. And I wish you all the best with Captain and the path that is still ahead of us. And I'm pretty sure we will do great things here and make an impact. And hopefully we'll also grow the community more and more so that more and more people contribute grows faster and everybody's happy in the end and world peace and everybody's happy and people can find out more about captain on k-e-p-t-n.sh now one thing i don't know if it's still a problem but i know some people were blocked from the site because of the dot sh extension which might be another lesson learned for you all i don't know if it's still a problem, but I know some people were blocked from the site because of the.sh extension, which might be another lesson learned for you all. I don't know if that's still widespread.

Starting point is 01:02:31 Any other resources they should look at besides, obviously, Captain.sh, and there's tons of great tutorials on there. On Captain.sh, you'll find there's a Slack channel if you want to get involved and interact with people like Johannes and Andy and Andreas. There are some meetings you can get on to as well, bi-weekly meetings. A Twitter channel, it's all on there. We'll put links to all the stuff in the description. Anything else that people should look at for resources online or is that the bulk of it? I think that's a good start. To be honest with you, go to GitHub and star us and contribute.

Starting point is 01:03:12 That would be great. All right. Thank you both for coming on the show. Anybody has any questions, comments, you can reach us at pure underscore DT on Twitter. You can also send us an old-fashioned email at pure performance at dynatrace.com I just didn't know if it was pure underscore is the pure DT.

Starting point is 01:03:34 Yeah, pureperformance at dynatrace.com If you have ideas for the show or you think you might want to be a guest, please reach out. Thank you all for listening. We really, really appreciate being able to do this i know andy and i learned so much by doing the podcast and it's because you all keep listening that we get to keep doing this so we really thank our listeners a lot and last but not least you

Starting point is 01:03:58 know big big thanks to johannes and andreas for coming on and sharing this and then giving a really great overview and history lesson on Captain because it's really fun stuff. So thank you. We have to say thanks for giving us the chance to talk and to also share our learnings on Captain.

Starting point is 01:04:17 Thank you also from my side. It was an honor. Thank you. Until next time, we'll see you all soon. Bye-bye.

CODACE Plant Stand

PurePerformance - What we have learned about K8s and Open-source when building Keptn

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.

Your Ad Here

CODACE Plant Stand

PurePerformance - What we have learned about K8s and Open-source when building Keptn

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.