PurePerformance - What we have learned about K8s and Open-source when building Keptn
Episode Date: October 19, 2020Keptn is now a CNCF sandbox project bringing a new event-driven approach to continuous delivery and operations. While many are just hearing about Keptn the first time, it is interesting to learn more ...about how it started, which challenges the team ran into, what they learned about K8s, and running an open-source project. We therefore invited Johannes Braeuer (@braeuer_j) and Andreas Grimmer (@grimmer_andreas) – both Keptn project maintainers and contributors – who have been working on the Keptn project since its inception.Especially for groups that want to start open-source projects or are on the brink of deciding pro or con Kubernetes should listen until the end as Johannes and Andreas tell us what they would do differently now if they would start today based on the learnings from the past 18 months.If you want to join the Keptn community, make sure to star our GitHub project, join the Slack channel, and join our regular community meetings!Keptnhttps://keptn.sh/Johannes Bräuer on Twitterhttps://twitter.com/braeuer_jAndreas Grimmer on Twitterhttps://twitter.com/grimmer_andreasKeptn Githubhttps://github.com/keptn/keptnKeptn Slackhttps://keptn.slack.com/Keptn Communityhttps://github.com/keptn/community
Transcript
Discussion (0)
It's time for Pure Performance!
Get your stopwatches ready, it's time for Pure Performance with Andy Grabner and Brian Wilson.
Hello everybody and welcome to another episode of Pure Performance.
My name is Brian Wilson and as always I have my lovely, beautiful and talented co-host Andy Grabner with me today.
Andy, how are you doing today? Are you feeling lovely, talented and beautiful?
Well, I thought it's handsome for male, isn't it? I mean, I'm not not the native speaker in english but isn't that the way it works i'm i'm stealing english back for for myself i make the rules
here i'm an american i make the rules it's okay uh yeah you learned something okay no everything
is good here and it's my i think it's my second or third podcast recording now with the new
microphone and you i hope people hear the difference i approve
and i'm sure it's a bit of an easier setup for you than having a have those power sources and
all that so for anybody who's into the audio stuff andy was using a zoom the h2n zoom microphone
which is great for field recording um but as a sort of home base record mic it's a bit cumbersome
because it's got a batteries and all the other stuff.
So what do you have now anyway?
I got the Razor.
Razor.
Who makes that?
Is that Motorola?
I think it's Razor.
I don't know.
No, I think the brand is Razor.
Okay.
Whatever it's called.
I don't even know what it's called.
I just installed it, plugged it in, and it worked. I'll't even know what it's called. I just installed it and it worked.
I'll have to search it later.
Yeah.
It was sent to me by, thanks and a shout out to our lovely colleagues from the Dynatrace marketing team for recording the Dynatrace Go sessions.
We got sent some better equipment, a better camera, and also a better microphone.
So that's what I'm using now.
Oh, I wonder if that's what I saw Alois the other day.
He was actually shopping for microphones.
He was having some issues.
And no, no, it wasn't Alois.
It was Michael Kopp in our training.
And his camera looks phenomenal.
I was like, holy cow.
I bet you it's because of that.
That's awesome.
I need to find out what that is,
because that might be nice to have.
Anyhow, we're talking about tech gear.
Everyone's working from home, so things to do to improve your sound or your video on those remote meetings definitely helps, I think.
So I don't think this is a completely wasted conversation as intro. But since we've been wasting time, we have a, I forget the poet, but there was a, oh, captain, my captain.
Right?
That's an old famous line from an old famous poem that I can't recall.
Obviously, English poem, but we're American.
But anyway, I'm rambling today, Andy.
Save me.
Sure, I'll save you.
Well, as you said, normally you scold me for using captain too often in a podcast,
but I think today it's allowed to to use it a
couple of more times than usual uh we actually got two of the core team members of the captain
team with us today and uh the reason why we brought them up well this is something you will
hear later but before we get started let's just give them a chance to introduce themselves it's
johannes and andreas
so my first question to you very very straightforward who are you guys and you may
want to start with maybe johannes is the first and then andreas who are you hi brian and andy
many thanks for his invite i'm johannes i'm main maintainer of the captain project and i'm also
working um in the innovation lab of dynatrace now for two years.
And yeah, it's a lot of fun working here and enjoying every minute I'm in the office
and I'm working for the innovation lab.
Thank you.
Hi, also from my side, I'm Andreas Kramer.
And I'm also working for Dynatrace for now about one and a half years.
And I'm also looking forward to share some of my learnings at this time working on an
open source project.
Perfect.
And folks that may have read the abstract to the podcast, you probably saw the similariness
between our two names, Andreas Grabner and Andreas Grimmer. So in case you ever write us an email
or find us online,
it's also odd for us to have names
that are that close.
Even the last name is very close.
The way we typically kind of get ourselves apart
or kind of differentiate ourselves
is with the first name pronunciation.
So Andreas goes with Andreas
and I go with Andy.
So in case you ever come across us,
that's the way we make sure
we address each other.
I thought it would always be
if you two are together,
you two would have to both salsa dance
and then we'd be able to tell
which one's Andy,
which one's Andreas.
Because Andreas,
I'm just assuming
you're not a salsa dancer like Andy.
And if you are,
you're not the same caliber.
But anyway.
That's true.
And hey, I just want to point out, you two have been talking about how long you've been with Dynatrace.
And the only reason I bring it up is because it's the end of September.
Andy Grabner, this marks nine years for me, which I think is about, you're a little longer than that, I think, right?
I got 12 and a half, yeah.
About the same.
Anyhow, and I love working here too.
It's awesome. Anyhow, I just wanted, yeah, yeah. Anyhow, and I love working here too. It's awesome.
Anyhow, I just wanted to point that out.
So for the audience,
the reason why we got these two guys on board
is obviously we will talk a little bit about Captain,
but the initial thought was to really talk with a group
that has been in the Kubernetes space now
with a project, with a service,
with an application for a year and a half.
And we really wanted to talk a lot about the learnings. What have you learned? What have you
done in the beginning? What made you make certain decisions? Which decisions would you have made
differently? Looking back, because we know a lot of organizations are investing in Kubernetes,
they're starting to build applications. And I think we see there's a lot of learnings,
and I think we have our learnings,
and that's why we want to share those.
And Andy, I'm interrupting you all the time today.
I think this also ties back to,
there was a podcast we did a while ago
about lessons learned or things to learn
and consider when trying to go for open source.
And I think reading over the notes of this,
some of this can apply for that as well.
Cause obviously besides the work was doing with captain and all those
considerations,
seeing some of the challenges that were have gone into this,
that we'll reveal during the episode of considerations of the whole open
source things and how sometimes you do have to change things and you're like,
okay, we got this far, but now we know we got to go back and make a major change
in order to keep it going forward. Keep in mind for anyone working on open source project,
there's some other good notes in here for that as well, I believe.
Yeah, but maybe Johannes, as you introduced yourself as kind of one of the core maintainers,
can you kind of take us back a year and a half ago about when this all started?
How did you start and why Kubernetes
and what are the initial lessons learned?
Absolutely.
Because I think it's important to understand
how everything started
so that we can also then better explain
our learnings that we had along our journey.
And for answering this question,
I think we have to go back in time
because it happened two years ago
that we in the innovation lab,
we had the goal to bootstrap a workshop.
A workshop that helps our customers and partners
to deal with the complexity
and with the challenges that come with the cloud.
As you all might know, Dynatrace is a SaaS company and of our own journey to move workload
from on-prem to the cloud, we also learned a lot of things.
We developed best practices, we applied certain strategies and learnings, and those we also wanted to deliver to customers.
For example, we collected all the best practices around automating the monitoring about performance
testing, then also setting up quality standards for your delivery and also applying deployment
strategies.
And this we all bundled into a workshop.
And then we delivered this workshop to participants that were joining a workshop.
And the outcome was always great because at the end of such a workshop, the participants, they were really amazed by what we showed them because we had hands-on labs,
then small demos, and they could really feel how it works to deal with cloud tasks.
And at the end, they all were very motivated to also integrate those components into their own organization.
But after a couple of weeks, they came then back to us and said, well, we wanted to get
started.
We wanted to set up performance testing.
We wanted to set up quality gates in our pipelines, but we are missing a very central
component, a component that helps us to get started.
And this was then the kickoff of the Captain Project
because with the Captain Project,
we announced or we still are working on a framework
for automating continuous deployment tasks
and also going beyond deployment
and helping to automate operations.
And this was then the kickoff.
This happened in January 2019.
And since then, we are constantly improving and evolving the Captain project.
This was the intro and then what happened two years ago.
Yeah.
And I think it's great that you bring us back because I also remember, I think it was in
the, was it in the hotel?
What was it called again?
Next to our office.
Donavelin, right?
I think that's what.
Yeah, in Donavelin.
I think it was in Donavelin, if I remember.
Right.
Yeah.
And we had all the folks together and we were basically using Jenkins for automating a couple of tasks.
Then we were, as you said, putting things together and then said, well, this works great with our sample app.
And now we know the best practices and how it can look like.
And you're completely right.
Then basically, we said, now let's implement this.
And we said, yes, it worked well for the demo app.
But how can we get this on the
road for real right how can we build something that actually works for everyone out there not
just for the sample lab that we picked yeah true yeah cool um the and and uh so the next question
that i have and i think this is also important to understand, because as you said, Johannes, in order for the listeners to understand why we made certain decisions or why you made certain decisions, it's also important to understand what the product, what the tool is all about.
Maybe Andreas, kicking it over to you.
While Johannes has already addressed or highlighted, kind of covered a little bit of the use cases, can you go little bit deeper in into what the use cases are that you guys are addressing sure yeah so as johannes explained
captain addresses two use cases namely the deployment the testing and evaluation of artifacts
as well as their operation tasks and automating this. So Ketten now provides you a framework
in order to implement this.
And unfortunately, you do not have to start from scratch here.
Instead, Ketten gives you lots of best practices
which are already built in.
For example, one of our best practices
is the so-called quality gate.
And this quality gate allows you to evaluate
the quality of your microservices in a fully automatic way.
So therefore, we are using well-known concepts
like service level indicators and service level objectives.
And the service level indicators can now be any metric which you get from your
monitoring solution like promisos dynatrace you name it and using the service level indicators
you can then formulate service level objectives and basically using these objectives we can then
build a quality gate so this is really now one best practice, which is built in into Ketten.
And then we can use this result of the quality gate, for example,
in order to decide whether an artifact should be promoted into production or not.
Very cool. And Andreas, thanks for that explanation.
I think the I mean, obviously, I mean, for me, this should be nothing new because we're working very closely on this.
And the concept of of the SLI providers of of of pulling data in from different data sources,
I think is also what attracts a lot of people to Captain because Captain is an open source project is agnostic to the underlying data source.
And I really like the fact that you kind of architecture it in a way that you can easily
replace components such as the data source. And I want to just also do a quick shout out for people
that want to learn how to build their own SLI data source. I'm giving a talk at the NeoTIS PAC event
that happens early October.
By the time this show airs,
I think it's probably around the same time.
But anyway, the recording should be out there.
I'm actually going through the use case
on how to write your own SLI provider
because we have seen users out there asking,
so how can I pull in data from Splunk?
How can I pull it in, I think, Wavefront?
Intuit is building an integration there.
How can I pull data in from other APM tools?
And so I want to show how this works.
Andy Grabner, can I ask you a question about that?
Of course.
I think it's all related.
So one of the things i've noticed in
captain and if again i have very limited use of it but usually we're saying a data source pull
from dynatrace now captain can pull from multiple data sources right so so the idea the benefit of
what you're describing is that you can pull some metrics all within the same test you can pull
you'll be able to pull metrics from multiple data sources to do your evaluation.
Is that true?
So right now, and guys, correct me if I'm wrong, but right now we have a kind of one-to-one
relationship from a captain project to a data source.
So that means you can set up different projects and then have the, you know, for every project
you have different data sources.
You can say, I'm running one for my monitoring data and then one for my testing data.
So this is one way of doing it.
The plan is to support the future multiple SLI providers for the same project.
But for this, I don't want to go too far out and promise too many things.
This is where I hand it back to Andreas and Johannes to see where we're standing with these multiple SLI providers.
Yeah, you're absolutely right, Andy.
For pulling in data from different sources, you have to split those based on projects.
That is what you can use as of today.
But plans are going forward and we want to also include other sources
into the same testing process and evaluation process.
And I think the reason, and again, Johannes, correct me if I'm wrong,
why we've been kind of dragging out these capabilities
because what we have seen is that a lot of organizations
do have a data source where a lot of data ends up anyway.
So with our Dynatrace customers, they're pushing all the data into Dynatrace first,
and then you can get this data out.
Like we were doing a lot of work with the Neotis folks.
They have their own SLI provider, yet they're also pushing data to Dynatrace.
So that means through the Dynatrace SLI provider, you can get all this data.
Or Sumit from Intuit, right? so that means through the diameters it's a live provider you can get out all this data or assume it from from intuit right they are pushing all the data into weaveworks
uh away so away front and then they're pulling the data from there uh this is why we we've we've
also kind of dragged out the decision to to focus on this feature earlier and focus on other things
first um it does make sense too because if you if the idea of putting all of your data in one source
makes a lot of sense for maintenance and cleanliness and all that.
So yeah, good points.
I was just curious there.
Thanks.
Yeah.
So I've got a follow-up question here, and I think this also brings us even more closer
to some of the decisions you made.
But we talked, and Andreas explained the use cases, explained SLIs and SLOs I talked about the
openness how does Captain actually work then maybe this is a question for Johannes probably
how does how does it work how can you actually and I think this is what we're promoting separate
the concerns between process definition tool definition how does this all work internally
and why is it so flexible that you can actually
extend captain as you like yeah that's a really good question because um yeah in its core captain
is an event based or follows an event-based architecture this means that captain itself
receives events from the outside then um yeah it analyzes this event and then kicks off a process,
a process like the delivery or like remediation.
And then Captain takes care of orchestrating all these events
that then are also sent out by Captain and received by Captain.
And this is a way that we or that other tools can be plugged in because they just have to listen to certain events.
And whenever they receive it, they do their job and then they just respond with a finished event.
And this is then the trigger for Captain to go on and to do the next orchestration step. I mean, I always bring the example
of the continuous delivery pipeline
when I explain the event-based approach of Captain
because, as you know,
continuous delivery has always testing in mind.
And when you kick off delivery process,
then Captain will at some point
send out a test point triggered event. This will then be
picked up by a tool like Gmeter, Selenium, or any other testing tool. Then the tool does its job,
and when it's done, it just responds with the test.finished. And finally, your captain then
goes on and executes or takes care of executing the next step in the delivery process.
And bringing this flexibility allows customers to also plug in their tools they have already in their organization and in their operational or development processes.
And yeah, this is great flexibility that comes with captain and maybe just one additional
thought for for understanding how captain is working from a technical perspective we
split it captain into two components one we call the control plane which is taking care of all the
eventing mechanism and the other one is the execution plane that is then responsible for executing the tasks
like testing, like the quality evaluation, like doing a remediation action.
And as of today, both planes are running on Kubernetes, but we are planning and already implementing features
to also let the execution plane run outside of Kubernetes.
But yeah, as of today, Kubernetes is the underlying platform
that Captain is running on.
Very cool.
Hey, Johannes, I got a question here.
So the way, or maybe a confirmation, because the way I always try to explain it, basically what you just explained to me is we bring a concept into continuous delivery that we have seen from architects that are designing modern applications that means you have loosely coupled components you call them the uh the execution plane right you lose a couple services and they all talk to each other through
events and there's obviously one component that makes sure that the right events are sent at the
right time it's like a business process engine that is executing that is that is uh orchestrating a process by sending the right events at the right
time and then you have you know loosely couple things that can also be replaced very easily that
can then actually do a task is this place understand absolutely correct yeah absolutely
correct uh also important to know or to understand is that Captain follows the principle of separating the process, which can
be a delivery process or
a remediation process, from
the actual tooling so that you
can easily plug in
different tools depending on your
ecosystem
and also on the
tools that the cloud provider
offers you.
Very cool. Now, so i think we talked a lot about you know the use cases how captain is working so i think by now if if listeners if you're still here
you should hopefully understand what captain does uh and if not then we really need to figure out
what we do wrong with explaining it there's a lot of other material out there too uh if you want to
catch up there's videos there's tutorials you can go to tutorials captain.sh and as we said with a
youtube channel with a lot of additional information but now let's really get into
things that like lessons learned i want to really understand how how did we get from where we started
after we left that room in that hotel from where we had a demo
environment to where we are now like where did how do we get started what did we you know because
it's it's a big problem that we always they had to solve and i guess we had different we call it
baustellen you know in german we had different construction sites different things, we had different construction sites, different things that we had to attack.
So maybe Andreas, I think it's time for you again.
What are the steps that we take back then and lessons learned?
Yeah, I can confirm that this is definitely a long way from being a demo until there is a VR project.
So I especially like
now to explain you our installer because this installer always really nicely
describes the maturity of the project of CAT.
So we started, of course, with a bunch of shell scripts as everyone does.
So this shell script has been executed by the users locally.
And as you can imagine, we run in lots of problems.
So we had lots of dependencies on external tools
at specific versions and so on.
And if such a tool wasn't available, our daemon did not work.
So basically, this was the first generation of Ketten 0.1,
a set of bunch or a bunch of shell scripts.
In the second generation,
we thought this is no way we have to do something against it.
So we containerized basically the complete installer. So we got a Docker image,
which was then executed by a Kubernetes job. And this Kubernetes job basically sets up our
complete demo environment. And this fully automatic installation was great for doing demos
because we automatically installed Helm with Tilda
because Helm version 3 was not available.
We installed Istio, we installed Knative,
and we even set up some virtual services and gateways
in order to access our UI and also the API.
Yeah, but as I said, we had lots of dependencies.
And you can imagine not every Ketten user really would like, for example, to have Istio installed.
So this kicked basically off our third generation.
And here we basically did a radical diet of the installer.
So we sat down, did a re-evaluation of our dependencies.
And for example, let's pick now Knative.
So in the first place, Knative looked like a really nice match to Ketten
because Knative eventing was the
perfect tool in order to manage, subscribe and also deliver our Ketten events.
Also, Kinective serving was really cool because this would even allow us to scale our services
down to zero and to save resources.
However, we found out that Kinective was definitely not the right choice for us.
First of all, it took so much resources that we, for example, needed 16 virtual CPUs for only doing a demo setup. And honestly, Knative was also too instable at this
time because
we always make the joke that we
spend much more time debugging
Knative than Ketten itself.
So my learning
here is definitely
do an in-depth
research of your dependencies
and especially
if you're using dependencies which
have a version below
1.0.
So really be
careful here.
And I can continue
now the story. So we even
were able to remove
Istio. We were able to
remove Tiller because Helm 3 was
there. And basically we resulted in an installer which only consists of a single Helm chat.
And this Helm chat now allows the user to configure,
for example, the services, how the API is exposed.
And this was a great move, which we did last year.
So Andreas, a quick question here, a recap,
because I think you just said a lot of interesting things
that I also vividly remembered as we went through that kind of progress of maturing.
A lot of people will run into the same thing.
So I like the installer story
because the first time we tried to install everything
through shell script from the outside
and basically installing things,
probably using kubectl commands that we pushed,
that we basically put together in shell scripts, right?
It was all kubectl and doing this and this.
And it was like probably very lengthy shell scripts
that's right the next thing we instead of doing it from the outside we baked it into a container
and then let that container run from within the kubernetes cluster to avoid you know obviously
problems with with uh being not able to execute shell scripts correctly. And then we basically put this into an installer job.
But then the other thing that I thought was very fantastic,
and I want to make sure that people don't think we are bashing on projects
that are not in version 1.0 yet, because remember, guys,
Kepton is also still in 0.0, in 0.something.
But there's obviously a price, quote unquote,
to pay if you are jumping on a new technology or framework early on.
And you are playing also, you know, a tester in most of the cases.
And that's what we did with Knative.
And I'm pretty sure Knative now has matured and is a great,
great framework for use cases where it really makes sense. Right.
I think that's, that's what I wanted to say. And, um,
Yeah, I added to that Andy,
cause I used to play with Captain in some of the earlier days.
And it's funny that you mentioned the bit with Tiller and Knative,
cause I remember going through some of the exercises and especially once I got
to the Tiller part,
there would always be some error going through
and I'd start looking up what my environment had wrong,
if there was a permission.
And I remember in the earlier days thinking like,
how is this gonna be usable if this is happening?
And you're describing exactly,
you just described exactly what happened to me
is what you all discovered and said,
well, it's because of these dependencies,
let's get rid of these dependencies.
And I think that just is a very strong action to take to say,
this is just not going to work this way. Let's, let's change it now while we can.
And you have to make those decisions,
which sometimes can be tough because you look at, you know,
I don't know how much work was involved in that decision,
but you'll come to the point where you say, either we're going to
keep having these kinds of issues, or we put in the work now and
save on it later. So kudos to that.
Alright, I'm sorry
that I interrupted you, but I think you wanted to go on with some other stuff.
No, I think the installer really shows how to get or how we got from a demo setup to a project setup.
So there are so many steps which you have to take to evaluate your dependencies and also of course you need to get your features
made sure i want to then then touch on one more thing though because i know in which state we are
right now so as of the time of the recording uh we have version 0.71 released. You just mentioned we provide the Helm option to install Captain for people,
especially that I think we have a couple of customers or users that are using the DyniACAP systems
where they want to first download all the images and then they want to use Helm to deploy it
after they've added and analyzed all the images.
I think that's great.
But I also want to bring up one thing and I want to get your kind of public opinion on this.
We've reduced complexity on our end, as you said, by only installing things that are really necessary to Captain.
We don't install Istio anymore by default.
We don't configure virtual services.
So we basically leave this to the end user depending on how they want to use Captain
because we assume if they want to do blue-green deployments with Captain,
then they may already have Istio and we use it.
They can decide how to expose Captain to the outside world
because they already have an ingress.
Now, the point that I want to make here,
while we made all these steps, we assumed,
or we kind of pushed the responsibility of the things
around Captain, everything that is necessary for something
to run in Kubernetes and to be exposed to the outside world.
We kind of pushed this away from us.
And we assumed that our users know about these things and they can provide these things and they know how to work.
But the reality, at least for me, it seems that we see a lot of people just getting started with Kubernetes. And now they're struggling also with these,
let's say, quote unquote, basic concepts
that we have assumed people know.
I think I wanted to hear your thoughts on this.
If maybe I just see this because I
work with a couple of users, or if you also see this.
Yes.
I would say we have sharpened the focus of Ketten.
So Ketten shouldn't be the tool, for example, which manages your certificates.
So yes, I completely agree that this challenge is now pushed basically to our users.
But I also think that this is the right place where this should live. So in the
management of the certificates. Before we've used, for example, Istio in order to expose services.
And as you can imagine, a framework like Istio also brings lots of complexity into it. So going back and using Kubernetes primitives
like services with the load balancer,
node port,
or using a Kubernetes ingress
is sometimes easier
than using Istio with gateways
and virtual services.
So, yes,
but of course, we also provide some quick starts how to set your production environment up with Ketten.
So I wouldn't say that the user is now alone.
So there is always help from our side.
But maybe this is a little bit more in the documentation now.
Yeah.
Can I bring in one
learning from my end?
Because it also
relates to this question or to
this discussion
because I think you can never make assumptions
about the environment you will be
deployed on.
This was also one thing that we
had in our first version of the installer
or also in the second generation. We made too many assumptions about the environment
that we deploy Captain on. And we also learned that this is not true and this is not the
case that we can assume this or that. It always depends on the user and also on the setup the user has been available.
Because there is, for example, the OpenShift group.
Then there is also the group that has an air-gapped system
where you have no internet connectivity to the outside world.
And then there is the group of the users
that have access to a regular Kubernetes deployment
like in a G Cloud or AWS.
And there is that much difference
between those groups
that you never can make any assumption
about the environment.
Yeah, I think that's a really, a really good point.
Never make assumption in which environment you end up running.
And also the, um, I think one additional point that I want to make, because you know, a year
to be honest with everybody out there, to be frank, I had no clue about Kubernetes a
year ago, and I'm still struggling with all the basic concepts because I simply never,
I was never trained on even though i have a basic understanding
of networking but i'm completely blank when it comes to you know routing and and certificates
this is just something that is definitely not my stronghold uh but now if i want to if i if i i
want to go towards kubernetes and whether this is Keptn or anything else, I believe the big lessons learned for me as a consumer, as a user of the Kubernetes ecosystem is that I
need to be aware of these things because I'm all of a sudden, I think, responsible for
it unless I work for an organization where they provide Kubernetes as a service to me.
But then most likely this Kubernetes is so locked down or so special that it's again
hard to get specific software in so i i think the lesson learned for me is learn learn learn
you need to understand kubernetes and that also includes networking routing security i think we
all need to have a basic understanding because otherwise it does a lot of struggle on any side here.
All right.
That was my little thing, my advice and lessons learned from my side.
You sounded like me there for a moment, Andy.
I know, I know, I know.
Right?
No, but I'm completely, I mean, Kubernetes is amazing and I think it opens so many doors,
but I think we also need to understand that.
Just Brian, I think we both come from the Windows background
just because I know how to launch Windows 10
and know how to open PowerShell
doesn't mean I know how to properly operate a Kubernetes cluster.
I installed Windows 2000 from 3.5-inch disk, so that puts me in another class.
So I think, guys, great lessons learned.
And I know there's a lot of different venues, as you said, there's a lot of stuff in the
documentation where we help guide people through things
that they need to know and need to install.
Now, shifting a little bit to a different area,
there was a strategic decision, I assume,
to make an open source project.
And I was wondering how does this all work out with an open source project?
Is this more beneficial in the end than...
Is it worth the effort going down the open source route?
Is it slowing you down because there's certain things and processes in place?
What are the lessons learned from actually running an open source project?
And in this particular case, a CNCF sandbox project.
I can get started.
I think the first really cool thing is that the community of Captain not only exists of the core developers, because there are also the users out there.
The users that try out Captain, that find the bug and report the bug, and they also bring in new ideas for features and
enhancements and this is really what makes working for an open source project fun and very very cool
i think this is one one learning from my end bringing the user the actual user closer to the product is definitely a
positive aspect of
running it as an open source project.
Very cool.
Any other thoughts? Andreas
maybe from an open source perspective?
So
Ketten is intentionally
designed to be an
open system which allows
custom integrations and a few minutes ago we talked about
sli providers so we have one for dynatrace we have one for promisos but now the community
really the community of captain is building an sli provider for vfront and other tools. And this really allows us to
scale Kevin. Otherwise,
when we wouldn't be an open
source project,
it would simply be impossible to
do every SLI provider on
our own. So
I see great benefits
in providing
a great ecosystem
which addresses then lots of use cases.
Yeah, I also remember I had actually a call
earlier this morning with a user
and he was struggling with a Dynatrace SLI integration
that actually where I wrote the code for.
And he said, you know what, I pinged you earlier
because I wanted to get some help.
And we went on the call and then he said,
you know what, I just looked up the source code because it's available anyway.
And so I found my way around.
I understand now how it works.
So I think this is also great of having this open out there, open and flat out there.
And people can just, if they're not afraid of code or looking at other people's code, also help themselves or build integrations, as you said.
Yeah.
I totally agree.
And I had the same story in my mind that really users are pointing you to a source code, a
line of source code that is containing a bug.
And then it's really an easy game to fix the bug and to deliver a new feature
and this would not be possible in a closed source product and yeah this this helps and engaging with
the users and also the users to engage with the with the captain team it's a thing i think it's
a win-win situation for both the users and the developers of captain so great stories but i didn't know that
there are bugs in our source code uh they i think uh we have a colleague who calls them
who calls them opportunities right a lot of opportunities in our source code to make it
better exactly wow that's like that's some
serious hr type action there i know it's called an opportunity it's no it's an opportunity wow
you're gonna have to remember that one yeah hey um i i know it's you know we've been doing this
for you guys have been doing this for a year and a half and we've been since when since when do you remind me
since when have you been engaged with cncf
engaged with cncf uh we kicked off the the the start in in august 2019
at there we yeah there we started to write our proposal for CNCF.
It was in August last year.
Perfect.
And then I think earlier this summer or maybe in the spring, we got officially accepted
as a sandbox project, I believe.
True.
Yeah.
Cool.
And now we are on the road to what's the next step?
Incubation, i think is the next
yeah um now one question and this i think brian we with whom did we discuss open source projects
and uh kind of how to run them and lessons learned we had uh that's a good question she
was from google i believe yeah i think she was from Google and she talked about open source projects.
Now, I know that both of you and also the rest of the team is not only active in our own Captain community,
but I believe, Andreas, you are also a part of the CDF, right?
You're actively an active member of the Continuous Delivery Foundation.
How does that go right this is a
similar organization like cncf and i'm also a member of a special interest group on interoperability
and this is now a perfect match of ketton because ketton really tries to address this interoperability problems using events.
And therefore, I'm here a member in this group.
And for example, we are currently working on a white paper, which really states the
advantages of using event-based systems also for continuous delivery.
Oh, that's cool. advantages of using event based systems also for continuous delivery.
Oh, that's cool. Yeah.
And I think the reason why I wanted to bring it up,
I believe this is also very important that when you are running an open source
project and you hope for external contributors, which obviously is one
of our goals in order to grow an ecosystem, you need to have external contributors. But I think you also need to give back and kind of contribute to other communities.
And this is why I wanted to highlight that you are part of the CDF.
And that has obviously benefited the CDF.
It has also already benefited us because they gave us a podcast.
We're now speaking at the conference and we are in constant exchange.
So I think as a best practice for everybody out there that wants to start, don't just
think about your own open source project, but if you want to make it big, also contribute
to others because in the end, it's a global community and we need to cross-pollinate or
whatever you want to call this.
I'm not sure what the right word is for that.
One other topic,
a quick coming,
staying on this with how to grow
an open source project.
We do have open or external contributors, right?
There's a couple of people
that have contributed already.
True, yeah. We have a couple of people that have contributed already true yep we have a couple
of folks out there that provide contributions almost on a weekly basis and um i know there was
uh and maybe you can uh imre i think he was he was great because i think his story and maybe
you can even tell is better than i can, the way he started with Captain and why.
Yeah, he took a look at our issues
and we have a couple of those marked as good first issues.
Those are issues that we consider as good,
yeah, as candidates for getting started
and for getting involved into the dev process of Captain.
And he picked one of those.
He assigned himself to the issue, got it implemented,
and then he filed a PR
and opened a PR against the Captain Core code base.
And we approved it.
And then he was in.
He was then, yeah yeah this was his first contribution
and he then continued to provide other features as well yeah that's cool because i remember i
had a conversation with him and i think he said he wanted to get into the open source space he
wanted to contribute but then he looked at kubernetes as a project and he said this it's so huge and and he doesn't he didn't really know and should he contribute a i don't know a readme
change but then he said no he wants to actually contribute some code so he was actually looking
in a space that is interesting for him which is the space we are in with captain but he also picked
a a smaller project where it was it was easier to contribute also code.
And I think he was really helpful.
He was really happy with the way we kind of onboarded him on the community, the way he felt.
So that was great to see.
I think we exchanged some thoughts on the podcast that I recorded with him because he's also running a podcast in Indonesia where he's from.
That was pretty cool.
Andreas and Johannes, did we, to kind of round up the open source project discussion,
is there anything else that we have learned over the last year or so since we have been kind of,
you know, pushing this open source project, especially around CNCF?
Are there any things
that people that want to actually start maybe their own cncf project that they should be aware
of things that we didn't anticipate in the beginning you know some challenges some things
they require from us just hurdles or maybe even even things that you know people should know
that these are things that they will have to do
if they want to become
a CNCF project.
That's one thing
that also we had to learn.
It's all about documentation first.
Well, you need to describe
what you want to implement
by your issue.
You also need to describe
what's on the roadmap
and you need to provide documentation
to make it even possible for someone to contribute.
And when I think back,
one year ago,
our issues, there was a one-liner in there
just explaining what needs to be done.
And now they are really nice framed
where we have an explanation,
what's the problem, what's the task, what's a different definition of done.
And this then allows other people and also contributors
to understand what needs to be implemented.
Long story short, documentation first is definitely a learning
that you have to follow or you have to apply
when it comes to an open source project.
Anything else from you, Andreas?
Anything that is good to know if you go down the cloud of an open source project that you learned?
So I only can confirm, Johannes, because you really have to learn that every communication is asynchronously.
So it would be best to don't talk to each other,
instead to write your thoughts on the GitHub issue.
We all know that this is lots of work,
and sometimes it's easier to discuss this directly. But otherwise, when you write it down,
it really gets transparent for
everybody and
then you can really see how
decisions are made and
you can influence
decisions which is really interesting
and
this was definitely a learning from my
side and that we have to
keep the documentation up to date
and keep
transparency.
So that means what you just described is a new opportunity for a new open source project
that is transcribing Zoom conversations or whatever thing and then automatically putting
comments on pull requests or issues with the right username attached to it.
I mean, that would be awesome, wouldn't it?
Because then you can have a conversation like we have, but everything is fully documented in Git.
Yeah, but then you've got to watch what you say.
It's true.
That's stupid. Andy wanted me to put this thing in here now.
Oops, I meant Grim or not Grim.
Yeah.
No, but that's it.
It's an interesting point, too, because
for so long,
there's been the idea that you have to get all
developers together in a room, right?
That working,
I think especially pre-COVID,
there was this idea that working remote in certain situations, especially with development teams, can be detrimental because they can't just get together and whiteboard things and hash things out.
And obviously, we're seeing that's not so much the case if you have good people in COVID. but I think that almost takes the idea of it all going like it's taking it one step further
where the lesson from the distributed team that you're working with in an open source project
is to say no don't have a conversation with them go even further away from getting in the room
and just type it all out which it sounds so counterintuitive or at least it's so
different from what was being said a year ago or two years ago in terms of everyone has to be in the same room and hash out ideas.
That putting it into words, typing it out there, making it visible for all to see, and sort of maybe not necessarily slowing down the process, but removing from the process the idea of communication,
one-to-one direct communication, is really
interesting. And I
don't know how to account
for that, if it's applicable
more widespread than just
an open-source kind of GitHub situation
or not.
So it's kind of opening a new
idea here.
Yeah.
No, I think you're right.
I mean, if you are, first of all, you take,
even though there are obviously emojis and exclamation points,
but typically you probably take the emotion out of conversations
and therefore really have to think of what you want to write
and in a way that people understand. And I think think i'm sure we've all been in that situation if you want to
say something right now but then you kind of take a step back and wait a minute or two and then start
writing it down then the stuff that you write down actually makes often more sense and is clearer
so i like it but it's obviously a big change change to the way we as humans have done collaboration together, especially in times when we've all been in offices.
Yeah.
I think also there's this idea everyone falls in love with, and I have no idea how true it is, but there would seem that we would have a lot more opportunities for those aha, those eureka moments.
As opposed to if I'm calmly writing down something in an email, then you're reading it, thinking about reacting to it.
There doesn't, that chemical interaction that occurs when people are in the same room.
You know, I think our bodies even just undergo a change when you are in the same room you know i think our bodies even
just undergo a change when you're in the same room and who knows if that process makes the
thinking process different uh do breakthroughs not happen or are they more likely to happen
that's a research project for somebody out there are there or breakthroughs more likely to happen
if people are reading well-written concise things that they can ponder on and think on before writing or how much does one-to-one interaction in presence of others how many times
does that actually spawn a useful eureka or aha moment so if someone's looking for a thesis paper
go ahead and run with that one hey brian i i don't want to do a summer writer today because i think
i think i think maybe the Summarator is going on retirement
For a while
It seems so
Because guys typically
I'll try to summarize the things that I've learned
But I think I always kind of
Recapped on what you guys have been saying
In different sections of the podcast
Agile recap
Agile recap
Exactly but
The thing that I want to ask each of you individually,
and I want to start with Johannes, looking back
one and a half years, if you would start over now
from scratch where you were, where we were when we left
that room with all the knowledge you have about kubernetes about open
source projects um would you do what would you do differently now how would you would you change the
architecture again would you pick different frameworks for certain things would you still
pick kubernetes or pick something? Or any other thing that,
one thing that comes to mind that you would change
and would do differently?
Johannes.
That's an excellent question.
First of all, I think I would still use Kubernetes
as the container orchestration framework
for deploying and running our framework.
But one thing that I would now do different is that I would re-evaluate each and every
dependency you bring in.
Because each dependency you need to update.
They have security vulnerabilities you need to be aware of and also force an update.
And at the end, each dependency also has a customer impact.
And therefore, you really need to reconsider what you bring in into your product.
And maybe from a not technical point of view, but more from a product point of view,
I mean, we had great possibilities to talk to customers in an early stage, but I would
also bring in customers into the kind of brainstorming and decision-making phase right at the beginning.
Because at the end, you want to solve a customer problem or a user problem. And those problems must be written down on a whiteboard right at the beginning
when you start working on a project that solves a problem.
Thank you for that.
Andreas, anything from your end?
Yeah, I would also go with Kubernetes, but probably some types we are using.
So some resources, for example, our shipyard file could be, for example, be a custom resource
in Kubernetes.
So using operators in order to control this lifecycle of these resources.
Maybe this would be a good design decision and maybe we will do it.
But we, of course, had the decision to not make us dependent from Kubernetes.
So Kepton should run anywhere.
And that was the reason why we stick to this architecture.
And there are lots of small issues.
For example, introducing health checks,
resource limits, rule-based access controls.
You all have to do this from the beginning.
Basically, do the design,
the security in mind from the first place.
This is really a learning.
And if we would now restart again, I would definitely would have this more considered.
That's a very good advice.
I think both of you, if I now want to do a little highlight of what the both of you just
said is always figure out what is the real problem is that you really want to solve by
talking with people that actually have the problem and doing this in a large enough group
that you know you're actually solving a problem, not for one individual, but for a larger group
of people for a product. Not that we haven haven't done it but we should have included even more
people up front and then if you stick to kubernetes which both of you agree on was a good decision
is always have security the security aspect in mind which i know is is basically the earlier
you address it the easier it will become
and you don't have all the technical depth, security depth, whatever you want to call
it, to deal with later on.
I think that's great advice from both of you.
Brian, what have you learned today?
Well, I learned a lot about Kubernetes that I hadn't thought of before.
I learned about the history, which was really awesome.
And I think what our listeners can
learn is that, so
Andreas and Johannes,
I challenged our listeners
somewhat to a drinking game anytime
Andy would mention Captain, because
invariably at the end of every episode, he'd be like,
oh, and Captain, plugging away
because he loves it, you know, but I just thought it was humorous.
Now, though, we've done a couple of episodes on Captain
throughout the last year or so.
I think this is a really nice in-depth one.
I think what I learned the best is that
whenever you're starting a project,
don't be dogmatic about continuing to use what you started with.
You have to be flexible and be willing to change.
Look at, you know,
and I think this goes for any development project.
I think this goes for anything in life, really, right?
What is the goal that you're trying to achieve
and where are you trying to get to?
And you're going to start out by picking some tools,
some frameworks from some anything to accomplish that.
If you're going to be successful,
though, you always have to ask yourself, what is the actual goal here? And is this approach and
what I'm using helping me get there? And if not, abandon it now and find something else that'll
get you forward. The point is not to use Knative. The point is not to use necessarily even Kubernetes.
The point is to get to an end state.
And if you focus on that, the rest will fill itself in
and you might have some trials that you have to go through
to find the right things.
But that should be the goal.
Even speaking of our customers,
their customers are the goal.
The goal is not to get into a microservices,
containerized, highly scalable platform.
The goal is to develop
or to deliver the best experience for your customers.
And if that means microservices and Kubernetes,
then great.
If the monolith is doing it perfectly fine
and you can improve upon that, great.
So always keep that end goal in mind
and don't be inflexible.
That's my takeaways.
Very cool. Hey guys, thank you so much for, for getting on the show.
I know, I think it was the first time for both of you and it's,
those are new kind of just talking into a microphone is not always that easy,
but I think you,
you were perfect guests
because you have a lot of experience that you built up over the last year and a half.
And I wish you all the best with Captain and the path that is still ahead of us.
And I'm pretty sure we will do great things here and make an impact.
And hopefully we'll also grow the community more and more so that more and more people contribute grows faster and everybody's happy in the end and world peace and everybody's happy
and people can find out more about captain on k-e-p-t-n.sh now one thing i don't know if it's
still a problem but i know some people were blocked from the site because of the dot sh
extension which might be another lesson learned for you all i don't know if it's still a problem, but I know some people were blocked from the site because of the.sh extension, which might be another lesson learned for you all.
I don't know if that's still widespread.
Any other resources they should look at besides, obviously, Captain.sh, and there's tons of great tutorials on there.
On Captain.sh, you'll find there's a Slack channel if you want to get involved and interact with people like Johannes and Andy and Andreas.
There are some meetings you can get on to as well, bi-weekly meetings.
A Twitter channel, it's all on there.
We'll put links to all the stuff in the description.
Anything else that people should look at for resources online or is that the bulk of it?
I think that's a good start. To be honest with you, go to GitHub
and star us and contribute.
That would be great.
All right. Thank you both for coming on the show. Anybody
has any questions, comments, you can reach us at pure underscore DT on Twitter.
You can also send us an old-fashioned email
at pure
performance at dynatrace.com
I just didn't know if it was pure
underscore is the pure DT.
Yeah, pureperformance at dynatrace.com
If you have ideas for the show or you think you might
want to be a guest, please
reach out.
Thank you all for listening. We really, really
appreciate being able to do this
i know andy and i learned so much by doing the podcast and it's because you all keep listening
that we get to keep doing this so we really thank our listeners a lot and last but not least you
know big big thanks to johannes and andreas for coming on and sharing this and then giving a
really great overview and history lesson on Captain because it's really
fun stuff. So thank you.
We have to say thanks
for giving us the chance to
talk and to
also share
our learnings on Captain.
Thank you also from my side.
It was an honor.
Thank you.
Until next time, we'll see you all soon.
Bye-bye.