PurePerformance - How not to start with Kubernetes – Lessons learned from DevOps Engineer Christian Heckelmann

Starting point is 00:00:00 It's time for Pure Performance! Get your stopwatches ready, it's time for Pure Performance with Andy Grabner and Brian Wilson. Hello everybody and welcome to another episode of Pure Performance. My name is Brian Wilson and as always I have my co-host Andy Grabner today. Andy, how are you doing? I'm really good. You're so polite all of a sudden. You made some terrible jokes earlier about me and what's wrong? What's happened? Why are you polite all of a sudden? Repentance. i feel bad you know plus the you know the the people on the other side of the

Starting point is 00:00:49 of the speaker they only have to know about the our beautiful relationship they don't have to know about the dark side and all the dirty secrets yeah yes yes yes that'll come out in like you know 10 years from now when we're both on Skid Row with bad drug habits, it'll be like how it fell apart. Anyway, speaking of things falling apart, I'm going to try an anti-segue. Speaking of things falling apart, right? Deployments fall apart all the time.

Starting point is 00:01:17 New technologies fall apart all the time. A lot of times people go ahead and try to use the fancy new toy and there's no good guidelines on how to get started, no good ways to say, hey, what could we have done if we only knew or had some experience to build on? What does this sound? Am I going in the right direction, Andy?

Starting point is 00:01:36 Do you want to save me? Yeah, I'm not going to save you. I'm just continuing your thoughts. The latest new cool technology, as we all know, is Kubernetes. We had several people on the show lately talking about Kubernetes I'm just continuing your thoughts. The latest new cool technology, as we all know, is Kubernetes. We had several people on the show lately talking about Kubernetes and what it all allows us, kind of getting us to the promised land of enabling every developer to deploy into production at any point without any problems because it's endless scalable, it performs well, and you don't

Starting point is 00:02:01 have to think about anything else other than writing a couple of lines of yaml code just just yaml yeah just yeah that's the only thing but we the thing is no so for real we all move into that direction right most of the listeners are probably already in kubernetes some will have to learn it and there are happy moments and sad moments and we want to make sure that this the happy moments that you will have more happy moments than sad moments and we want to make sure that this the happy moments that you will have more happy moments than sad moments with kubernetes and this is why we brought christian heckleman on the podcast today christian from ert who has been uh i think christian first of all hi i want to give you the word hi hello. Hello. Christian, we've been talking, we've been working together for you know a year or two or even more, I don't remember, I think it was Barcelona. Yeah, so

Starting point is 00:02:52 it's with all the COVID around, you cannot tell what year it is right now, right? Exactly. And you've been a part of the journey with, you know, obviously Dynatrace and with Captain, but what really today is not about Captain, is not about Dynatrace and with Keptn. But what really today is not about Keptn, is not about Dynatrace. It's really about your experience and your work that you are doing in adopting Kubernetes for the company you work for and the things you've learned, the things you love,

Starting point is 00:03:17 the things you don't like so much, things you would have wished you would have known before you get started. And you did a great presentation, which we're probably going to share. You it how not to start with kubernetes a lot of great wisdom a lot of great memes in there and it's more kind of comic book it's yeah yeah which makes it more enjoyable to learn and work with kubernetes right but in all seriousness before we get started christian can you give us a little background to yourself? Because people can then better relate on who are you, where did you come from, especially professionally,

Starting point is 00:03:51 so that they know why you're actually transitioning over to Kubernetes and what you're doing right now with Kubernetes. Oh, yeah. So first of all, my current position current possession position is a devops engineer drt so um a couple years ago i didn't have anything to do with community kubernetes or docker at all so i wasn't database administrator and before i was a database administrator for around about six seven eight years i don't recall at the moment um i was in charge of the network of my former company. And before then I was kind of field engineer outside. And so when I started so that my second time I'm working for ERT and when I started at ERT, we didn't have any Kubernetes, any Docker environments or something like that.

Starting point is 00:04:46 We were deploying our services via Puppet or on IaaS with WebDeploy, etc. And I think it was three years ago or something like that. It was Kubernetes 1.9 back in the days. We started to, yeah, having a POC because one of our architects were asking us, so, hey, do we want to start and Mises, for instance, and we did a comparison. And that was the first mistake we did, that we did this kind of submarine project, I will call it in German. I don't know if there is the same in English, because I'm not a native speaker. And we started that or I started or I, my task was then to install a Kubernetes cluster, which they can play around our developers. And another guy was taking the Docker Swarm part.

Starting point is 00:06:02 And there the whole journey started i would say with with kubernetes in our company and as i said that was the first mistake not to bring everybody on board building the platform because um building kubernetes cluster or or yeah installing kubernetes cluster is not the problem but maintaining, thinking about all the other stuff like networking, storage, and so on and so on and so on. This is quite a hard learning curve, at least from my perspective. And so, yeah, so I started, yeah,

Starting point is 00:06:40 going to Kubernetes IO or what the domain is and then reading through the docs and how to install kubernetes and i've provisioned myself three linux vms and start with kubeadm and and all the stuff right i'm basically building kubernetes from scratch and um it was kind of okay it's working and then we we thought about okay how are we getting our our traffic inside the cluster and it was kind of okay googling and then searching by myself so if you're starting i would i would always recommend if you start using kubernetes get help if you don't have the um yeah the knowledge in your company get external uh knowledge, consultants or whatever, and start working

Starting point is 00:07:29 with them. Because a lot of mistakes I made in the past were just because I didn't know it better, or you couldn't find in the internet any better solutions. So yeah, that's quite a big challenge running Kubernetes, especially if you're running it on-prem, right? So running Kubernetes on-prem is a pain in the... I just wanted to ask a question about that

Starting point is 00:07:54 because it's an interesting topic. Andy, as we often discuss with guests, a lot of times they're experimenting on your own, just like you or Christian. And it's hard to get by until you can prove something out. So what you just proposed there about getting external help, at least if it's paid, is sort of one of those chicken and the egg problems. Because how are you going to get budget to get external help if you haven't proven something out, maybe haven't even gotten buy-in yet from your management obviously we've always seen success

Starting point is 00:08:25 cases on migrations have been with upper management buying but if you're doing this from from the ground up in the submarine project how would you go how would you recommend somebody goes about are there free external resources that you're aware of i know you mentioned something about the internet at this point now as opposed to when you started, are there some resources people can leverage that they wouldn't necessarily have to go to a paid level yet that they can start at? How would someone tackle that from your point of view, Christian?

Starting point is 00:08:55 So when you want to deploy Kubernetes, there are great provisioners, so tools you can use to start with and which will yeah give you the ability to to spin up a kubernetes cluster quite stable one uh without the deep knowledge like like i stumbled across rancher which is an open source uh project but even though they're um yeah buying or you can buy from them professional support as well. But if you want to provision some clusters, they have a quite good documentation on how to do stuff

Starting point is 00:09:34 like configuring ingress and load balancing on your cluster and so on. So I would recommend to do something like that or using other tools like hopes vmware tensu or whatever what what your company um is able to do and yeah so so but but it's still like when you're searching in the internet you find for for one kubernetes setup or or what you want to achieve, tons of different documentations and guides. And when you're taking a look at the CNCF landscape, what I'm always doing,

Starting point is 00:10:14 if I need to, to bring in something new, like new storage provision or whatever, I'm looking at the CNCF landscape and looking there what tools are available, which are not officially supported, but which are projects within the whole Kubernetes ecosystem, right? Not relying on some weird GitHub repos with some scripts in it. That's basically what I can tell the people. And even though if you're starting with your Kubernetes, you have to think about what kind of workload

Starting point is 00:10:49 you want to run on your cluster. It's not only you're building a cluster and then shifting everything from your big monolith into a container and then throw it into the cluster. I think this will make you not happy to operate the cluster right because you will find a lot of side effects here and there one one of the resources that i found recently is it's called tech world with nana she is an amazing youtuber on technology topics

Starting point is 00:11:22 she's actually in austria she's she's from vienna and i reached out to her and brian we may have her on one of the upcoming podcasts hopefully once she has time but she did an amazing set of tutorials and she has like several hundred thousand views on these tutorials uh so a lot of them are free most of them are obviously free on youtube giving you like a four hours or an eight hours course on Kubernetes getting started with it. So I suggest folks will add the link to this as well. But to kind of Christian, what I hear from you is, and this is also part of your presentation, I don't start with installing Kubernetes from scratch when by now things time has passed

Starting point is 00:12:01 as many options to really, you to really get started with Kubernetes. And maybe even you just go to your favorite cloud provider and then you just spin up a managed instance there. Yeah, absolutely. But even though if you're starting on a Kubernetes cluster, if it's a GCP or an EKS cluster or what it's called in Alibaba cloud i don't know if you i would say if you only have to support one kind of of application or your company is

Starting point is 00:12:33 really small and you you don't have any uh or not any a lot of dependencies um across your your environment like services needs to talk with services on your on-prem environment and a lot of routing and so on, then spinning up an EKS cluster is it's really easy, right? But as soon as you have to connect all the services with your legacy applications and so on, then you still need to figure out, okay, how I'm sizing my VPCs, how I'm doing routing there, how I'm doing security, and then all of the stuff you need to think about. While, of course, something like load balancing.

Starting point is 00:13:15 So one of the things I learned was running Kubernetes on-prem, if you want to run a load balancer, you first need to have something like Metal lb in place so so you can have a floating ip around your cluster and then setting on top an ingress controller to to distribute the traffic inside of your your cluster and if you're running this in an um let's say managed system then you have to think about the costs as well, right? So sure, you can install an AWS, the AWS LB controller, and every time you're deploying the service, it will spin up a complete entire load balancer for you,

Starting point is 00:13:55 which is great. But if you're having a couple of hundreds of services running on your cluster, then it will be getting quite expensive. Having 400 load balancers here in place, and then you should think about, okay, I need an Ingress controller for them because not all services needs to have their own load balancer, right? So you're and so on.

Starting point is 00:14:17 So there are still considerations you need to make, um, um, when you're running on the cloud, but it gets getting much easier because a lot of issues I had was just because of the fact we didn't know how to configure or having best practices in place for your worker nodes. Let's say some system settings

Starting point is 00:14:40 on your RK send OS machines, right, which was, yeah, so little story here. Um, I had the issue that, um, with ZendOS and XFS file system, um, the SLAP unreclaimable cache was, was filling up and, and worker nodes were crashing out of sudden. And this was all because of a lot of mounts, of mounts which were made from one of our deployments. We didn't know why. You have to dig into the kernel crash files and reading a lot of kernel bugs and so on to find out,

Starting point is 00:15:21 okay, when I changed file system to X4, we don't have this issue anymore. And this is what you don't have to think about when you're running on an EKS cluster or a managed cluster, right? But you still have to think about, yeah, other stuff you need to consider like load balancing I mentioned before, right? Yeah, so, I mean, you bring up a lot of points that, I mean, I didn't run through into each of those because I'm not as deep down

Starting point is 00:15:52 into the weeds and having to provision and manage these Kubernetes clusters for a whole organization, but just for the work we've been doing with Captain, where we have been providing different ways to stand up Kubernetes clusters, whether we give people instructions on how to use EKS and AKS or something like that, that's kind of easy. But then people are not familiar with Kubernetes and maybe they want to run it on a K3S, as you know, right? They've been using Keys or micro-Kubernetes. But all of a sudden, I think this was my wrong perception was I thought

Starting point is 00:16:23 I run a comment and have a Kubernetes cluster and all of a sudden, I think this was my wrong perception was I thought I run a comment and have a Kubernetes cluster. And all of a sudden I can just deploy my apps and I don't have to take care of anything else because Kubernetes takes care of everything. But then I realized that I have to understand what is an ingress? What can I make sure that my ingress is exposing my services to the outside world with SSL, with TLS? And these are all things that I, coming from my background as a developer, never thought of. I never had to think about networking.

Starting point is 00:17:01 I never had to think about security. And all of a sudden, if you really want to use this platform as a quote-unquote self-service magic tool, you have to take care of this, right? And this is where I think some of the misconception always comes in. It is challenging. Yeah, it is challenging. And when you mentioned the developers. So developers should focus on writing code and not think about how networking is being done and so on and so on. But you need to understand the basic concepts of Kubernetes, how, for instance, you're calling another service

Starting point is 00:17:38 within your cluster, right? So what I always see is developers are provisioning or you're installing a service, deploy a service in the Kubernetes cluster, defining an ingress, and then calling another service through the ingress. So the traffic is flowing outside of the Kubernetes cluster to the public internet, back to the Kubernetes cluster, which are basically, yeah, it's latency, right, for your service. Well, latency and costs and costs yeah yeah

Starting point is 00:18:06 absolutely and i also had no clue about right i mean again when i started and now a year and a half into it i if i think you're completely agree with you next time and i would start with something like this i take the time and sit down and go through documentation some training because you can avoid a lot of basic mistakes like exactly this. And you are not aware of these mistakes. Well, like that, you know, we talked about the load balancers. If I spin up an EKS and I just say load balancer ingress, oh, that's awesome.

Starting point is 00:18:38 And everything works fine. And then you get the bill at the end of the month. Exactly. Exactly. And then also other mistakes. There's one slide in my presentation where

Starting point is 00:18:52 this Oprah meme with everybody gets admin. This was in the beginning. I hadn't had a clue about RBAC and all the security stuff in Kubernetes. That was something I learned afterwards, after developers deleted entire namespaces and environment-based namespaces, not application namespaces, even environment namespaces, and how to prevent stuff like that.

Starting point is 00:19:19 Or how to, let's say, if you forget in your ingress annotation, in our case, the host field, it will create an Asterix, a wildcard ingress for you. So out of sudden, every traffic was routed to the one deployment and every other was yelling, what's happening? When I'm calling my URL, I'm getting the other service, right? And so this was, I could mitigate this using the open policy agent, for instance, but also RPEG you need to think about and don't give everybody admin access on the server.

Starting point is 00:20:04 Yeah, because you would also not give everybody full access to all the production servers all the time. Yeah, absolutely. And I think this is the challenging, again, this is kind of the balance we want to, we all preach at least, you know, in the work I do, we preach about autonomy. We preach about how can we make,

Starting point is 00:20:22 how can we give everybody more responsibility, but still giving them enough guardrails so that they cannot make mistakes. And I think this is the challenging thing now with Kubernetes also to find out. It's a great platform if you use it right. But also it's such a huge platform that not everybody should have access to everything. But you have to have a basic knowledge and you have to figure out what makes sense to give into the hands of certain people and what doesn't make sense what you need to provide some processes using tools like continuous delivery tools that can then automate certain things right yeah so so a big part is

Starting point is 00:20:56 documentation to have documentation for the developers in place to have templates in place they can reuse like i created a helm template for instance which was basically covering most of our deployments and yeah doing stuff like um or preventing stuff like accidentally exposing the service to the internet when it's not needs to be exposed to the internet, right? Or setting resource limits by default on the deployment. Because what I've seen, this was also a big learning thing for me, that dealing with resource limits and requests in Kubernetes. So on our first cluster, I was always wondering, okay, why is Node just exploding? Why it's not working anymore? What is happening there?

Starting point is 00:21:49 And then I figured out, okay, so the pods are utilizing too much space or whatever, too much memory. Java applications are quite memory consumption, very high, right? So what I did is, first of all, setting resource limits on namespaces, those they cannot overutilize. Yeah, the the RAM or CPUs and also building this into our template. So this will be not forgotten anymore when they are deploying something. Right. And yeah, stuff like that. template so this will be not forgotten anymore when they're deploying something right and um yeah stuff like that so so documentation trainings for for the developers when they want to start using kubernetes because building a docker images is quite easy deploying something to kubernetes

Starting point is 00:22:39 is quite easy but does it always make sense it It's the other question, right? Yeah, that's actually a good one. So is Kubernetes necessarily by default the right choice for any type of app or might it be better to think about other platforms in the future? Or like, I don't know, it could be that you're just deploying a container on an ECS or Fargate or maybe it's serverless is better or maybe it's just an old-fashioned VM and you just run it somewhere. Yeah, so the best example I have here is websites, static websites. So why you should deploy a static website as a container or deployment in Kubernetes when you can put it into a three bucket and you're good to go, right? That's a running container is constant. is when you can put it into a three bucket and you're good to go right that's

Starting point is 00:23:32 a running container is constant do you run do you run regular training sessions like how does this work do you do regular training sessions that you wish to have regular training sessions with developers that or how does this work yeah so so um from time to time when I'm seeing there is something popping up like, hey, we want to deploy something to Kubernetes, then I'm reviewing the application or what they are doing there, giving guidance, having trained it depends on the knowledge of the developers. it all comes to to to governance that when somebody is writing new service that architect should look into it should decide okay it doesn't make sense to run it on kubernetes i'm not using kubernetes only because you can spin up an environment uh very fast and then without having yeah to tell anybody else right because what i've seen um when we started using kubernetes that every developer was throwing its application into a container and on the cluster because they don't need to go over the, and I don't know, the hurdle. The hurdle.

Starting point is 00:24:38 The hurdle, yeah, whatever. So I'm trying to, I apologize, to create a merge request in our Harvard Hira configuration, right? And they used Kubernetes to bypass this, even though if the service doesn't make sense at all to run in Kubernetes, right? You know what I'm hearing a lot of in this conversation, and thank you both for, you know, I've been being quiet because I've been trying to listen and learn. It sounds as if there's a disconnect between the general marketing of Kubernetes, and I'll use that loosely because there's no Kubernetes company per se right but the general marketing of kubernetes as pop it in everyone can be self-sufficient and things just run smooth and easy right which we most of us at this point know it's not as simple

Starting point is 00:25:32 as that but kristen as you mentioned things like training like documentation and all these things that for decades people have been trying to move away from. We're obviously not moving anywhere. Now, the way I see Kubernetes, and I don't mean to say that Kubernetes is a loss, right? Because Kubernetes is opening a whole new world for great other things. I think there's just still a lot of overhead and maintenance, scripting and all,

Starting point is 00:26:00 but a different type. And the benefit of Kubernetes is that as opposed to, let's say in the old days where you had to stand up your servers, you had to have a physical location to run it yourself. You couldn't just pay someone else to do it easily. You had to pick your network, set up your network, do all this other kind of stuff. You can now spend time automating a lot of your deployment process which gives you back that time

Starting point is 00:26:30 to do these things like the documentation the training and these other bits that are still gonna you know some of these things are not going to go away we still have we're still humans we're still stupid meaning we don't have these things programmed into us. And if program Andy crashes, we can't rely on program Andy being there, so this has to be documented. So I guess what I'm trying to say in a long-winded way is that while not all the things that we all hate about traditional setups go away,

Starting point is 00:27:01 the critical functions still remain, yet because of the advancements in technologies, this gives us the ability to automate and push past a lot of these other things that we would normally have to do in tandem with the documentation and everything else. So it's not a net loss. It's still a net win. But I think most people have to come to the realization that this is not some magic fairyland where if we think back to cloud foundry the idea is here's my code so i forget what the haiku was andy you know here's my code deploy it i don't care where you know that's you still have a whole team of people

Starting point is 00:27:35 maintaining that cloud foundry thing but at least for the developer on that side it's a bit more abstracted here as you're saying there are a lot of things you still have to know and learn um and i guess just bursting the bubble that it's it's not just plug and play did plug and play ever come to fruition remember way back when plug and play came on or my dating myself um it's it's yeah it's still gonna be tough and there's still a lot of things and that's why i think this guide that you put together so awesome because it's covering a lot of these things. Earlier when I was talking about it, are there any resources out there for people? Although this isn't a definitive

Starting point is 00:28:09 resource, this gives you a lot of things to say, hey, what are some check marks we should go through? Anyhow, I'll shut up now. I've been rambling. No, that was good. And I think... So what I already told Andy yesterday so when when you're building

Starting point is 00:28:28 a kubernetes environment or kubernetes cluster it's not only the platform the developers are using you're building that in the end you're building a small data center within your data center or within the cloud right with all the different dependencies like networking and so on storage blah blah blah and and you need to think about the stuff when you need people who are taking care of the stuff as well. I mean, I had a little smile on my face when Google announced their autopilot feature of Kubernetes last week or this week. I don't remember which are taking away a lot of administrative tasks but

Starting point is 00:29:06 also saying okay you can now only use this kind of networking provision or cni plug-in or whatever right but it's basically in in the end you can only control the complexity if you're reducing the number of potential combinations that make it so complex, right? That's why it becomes an opinionated platform. And we're back to Cloud Foundry, as I mentioned once in an episode several months ago. Yeah. And I think they had an idea there. Yeah. I completely agree with you, Brian. And I think this is also what we all try to do, we try to leverage the new shiny thing, but then we learn while it's great and powerful,

Starting point is 00:29:50 it in the end doesn't make us more productive, or at least not the developers. Therefore, we need to come up with a very clear, defined, prescriptive, opinionated path of doing 80 or 90% of the work. And for the rest of the 10 remaining percent yes we may then need to go and look into some other options outside of our opinionated way but i agree with this is also why we invest you know so much on on standardizing things whether

Starting point is 00:30:19 it's open telemetry whether it's the stuff, and now I say it, with Captain, we try to, it's all about making, in the end, making it easier to get work done on top of something that is very complex. But we also have to narrow down the complexity because we can, this is also why, if you look at what we ended up doing with Captain, we are really narrowing it down now to say, you know, if you want to try Captain, we are really narrowing it down now to say, if you want to try Captain, then you take a Kubernetes cluster and we want you to use, let's say, Istio as a service mesh.

Starting point is 00:31:00 And we just give you sort of in the first iteration of Captain, we said it is deploy, test, evaluate. This was super easy and super clear. Now we went a further step because we had people had people that say well we need more flexibility but still by default we we give this prescriptive approach and say this is how we think you are most productive in deploying this particular type of technology or app or service and if you use it for 80 percent of the use cases, we think you're going to be fine. And for the 20%, you can turn some knobs and you can change our opinion to be closer to your opinion.

Starting point is 00:31:34 And I think, Brian, we had a similar discussion in one of the episodes we recorded that hasn't aired yet as of today, but I think it was with Baruch, if I'm not mistaken. Oh yes, the Liquid software. The Liquid software, same thing. By the time people listen to this it will have aired. It's the previous episode.

Starting point is 00:31:54 But yeah. Christian, I know in the end of your presentation, and we will share it, you have a nice conclusion, and as I think Brian highlighted before we started the recording, there's a nice meme on there. It says the H in Kubernetes stands for happiness.

Starting point is 00:32:15 But you have a nice summary of points of things that you want to be careful with. Like don't deploy a production cluster without a review of a professional. That makes a lot of sense. And train the people. We already covered that. The templates. I know you covered it slightly,

Starting point is 00:32:38 but I have a question on templates. So if you provide templates, but you allow people to modify, templates are just templates, but if you don't enforce them, do templates alone help you? Do you need governance or something else on top as well? Because otherwise people can do whatever they want with the templates. So when you are having a lot of microservices and a lot of developers who are adjusting templates and so on, it's quite hard to get an overview of what they are doing there.

Starting point is 00:33:07 So at least that's my, what I've seen, what's happening. But it could be in other companies, yeah, another thing, right? So a lot of companies who are only deploying one set of software is different than having a lot of different systems flying around, legacy systems, and then want to transition to Kubernetes. But in my opinion, you still need to have some kind of governance who are looking on new services, new deployments, which are then going to Kubernetes clusters. So developers could play on one cluster if they want, right? And then nobody wants to restrict them into playing around with stuff. But as soon as the service is getting promoted to a higher stage, to an official integration environment or whatever, then somebody should really look at what they

Starting point is 00:34:08 are doing there and how its service is working. Normally, in my opinion, this is classical task for the software architects, right? So they refuse service, what the service is doing, what the communication of the service is, and so on. Another thing that I want to ask you now, you mentioned earlier some of the things that happened and that shouldn't happen, like no resource limits. You mentioned access control and somebody was accidentally deleting all the namespaces. Any other horror story? Oh, a lot.

Starting point is 00:34:51 A lot. I think I've found every yeah, and here again, I don't have anything. Was that Farfignitian? Yeah.

Starting point is 00:35:05 If you step, it literally And here again, a German watch. Was that Farfignutian? Fettnäppchen. Fettnäppchen, yeah. If you step, it literally translated, it means you're stepping into a puddle of fat. That means you make a bad step, and then you do something that you shouldn't do, right? Are there puddles of fat? Sorry, are there puddles of fat laying around in Germany that this came from?

Starting point is 00:35:23 I'll look that up. I'll look that up, how that came about. It's just fascinating how that term came about. Yeah, a lot of vice versa lying around in Bavaria. Yeah, sure. So for instance, a classical thing is tagging of your Docker images and using something like tag latest. So if you, I can remember I was searching around to get an stored provisioner. Right. So I used in the beginning of our first or second cluster, cluster of S combined with a caddy to provision persistent volumes.

Starting point is 00:36:08 And yeah, it's a classical copy and paste. And yeah, it's a classical copy and paste. And yeah, it's working. We have persistent volumes. But I didn't realize that the deployment of Icati was using the tech latest. And node goes up, node goes down. And all of a sudden it was, yeah, and it was using for, at the end, the image pull policy always,

Starting point is 00:36:30 which means every time the pod getting restarted, it will be pulled in the latest image from Hecate. And all of a sudden our storage provisioning was not working anymore. And it was kind of, okay, what's going on here? And then you have to dig into the problem and okay, why it's not working here and then you have to dig into the problem and okay why it's not yeah working anymore and so on and so on and even um developers were using for instance um

Starting point is 00:36:53 heavily used image you know it's alpine and they're using alpine latest and i can remember on in barcelona at the perform on one day um there was an um vulnerability found in the alpine image for uh i don't know empty root password or something like that and they updated the the alpine image and i was getting yeah pinged by everyone our company up a lot of people kind of hey my service isn't working anymore in kubernetes Kubernetes down, Kubernetes down, help, help. Yeah, world is on fire. And I was kind of, yeah, but your pod isn't working at all, right? It's, what should I do?

Starting point is 00:37:38 Kubernetes is working. So your pod isn't starting up and have you tried to run the pot locally for instance that's also something i've seen a lot that developers are just yeah using the cicd tools to build their containers and throw it in in the cluster and really developing inside of the cluster instead of trying okay is my dockerfile building my application? So I have a lot of tickets regarding, hey, my deployment isn't working yet, but your Dockerfile could not be built. So try it locally, fix it locally, or then you can avoid this turnaround times

Starting point is 00:38:19 for fixing stuff like that. And especially because then you become the bottleneck and you deal with things that you shouldn't deal with because these are basic things that should be checked beforehand absolutely absolutely yeah i mean for the latest uh like latest tag isn't that something where opa comes in the open policy agent that should that can you be used for that to validate like no latest is used? Yeah, sure. And I think it should be possible with OPA,

Starting point is 00:38:52 or how it's pronounced in English. I don't know. Policy agent. OPA. And you know OPA in German means grandfather. Yes, my mom had a friend who was a grandmother. Oma. Yeah, Oma and Opa.

Starting point is 00:39:13 I knew this from that, but I forgot what it was. I just know I heard it. I'm pretending to be cool with you guys. But I have a question. Do you also say Oma and Opa, or do you say Großvater and Großmutter? I'm saying Oma and Opa. Okay. But it depends on the region in Germany.

Starting point is 00:39:33 Yeah, same here, I guess. I mean, Oma and Opa is very Austrian everywhere, I think. As you know, I'm a little bit Austrian. Yeah. It's funny, because I know i know this total sidetrack but what i was yeah there but but i just have to say this because the um a lot of times there'll be music that i like and i always want to go to andy with it but i'm like no it's german it's not austrian um and i'm thinking like well is it close enough and then i'm like well is canadian close enough to you america united states like they're totally different they speak the same language but it's it's different

Starting point is 00:40:04 enough so i i. So that's why I don't bother you with the German music I listen to. No problem. So I'm the master of sidetracks normally. But so speaking of this latest, just to clarify, right, because I'm pretty sure I understand what the deal is, but for people who might not understand what people

Starting point is 00:40:20 are doing, what it sounds like people are doing is instead of saying which version they want to get, because a lot of times when you're getting something from Docker or GitHub or something, you just say latest, and that's a default tag that gets applied to whatever the latest push is. That's not something that people necessarily even put on their builds.

Starting point is 00:40:38 But latest will always just get the latest. So the recommendation would always say use the specific version that you want instead of just latest, because obviously if you're always getting latest as soon as someone updates it you're going to get that new one and you have no idea what you're going to get. And another one, it's a security issue as well. If you're pulling some images straight from Docker Hub, you are not aware of what is inside

Starting point is 00:41:01 the image. Even popular public images couldn't be compromised and so i always recommend to have a set of base images which developers can reuse on in your own registry which you have you scanned and and like released for for official use for your developers that's one of the things i would recommend as instead of yeah running all the stuff but um as andy asked me for for other yeah mistakes i've seen so despite the fact and i should call my my twitter handle grumpy admin after the show because it's i'm only complaining and complaining and complaining but but k is great. Kubernetes is great for automation and so on.

Starting point is 00:41:45 But from time to time, you're thinking about, okay, classical example, somebody is deploying an application and it's kind of, okay, I've deployed two pods. And you kind of, okay, and where is your auto-scaling configuration? I've deployed two pods. It's high available and I'm fine with that. And you're kind of okay why you don't use all the stuff kubernetes is for

Starting point is 00:42:10 right automatically scaling your deployment it's it's like deploying two vms in data center and and you're good to go no that's not how it should work right and here we are again back on the training story to something like that. Or health checks in the deployment. If you're getting an email from one of the developers like, hey, can you restart my service in Kubernetes? You'll be like, what the? Right? Because this is core capability and exactly what's built. I understand.

Starting point is 00:42:47 I think it's great that you're complaining because, again, we always hear about the positive side of Kubernetes, which there's a lot, right? But this is the real world. What are the admins dealing with? What are the things that people are still... What's the people factor, the human factor of leveraging? You can build in whatever guardrails you want. You can't, as they say, you can build in whatever guardrails you want you can't you can't as they

Starting point is 00:43:05 say you can't fix stupid all right so you still have that human factor that's gonna do things wrong not leverage what's built in there you know it goes back to way back when i had a job at a record store we had a big suck because there was like this island in the middle of the store where we had like you know walkmans and radios that you could buy but then the register was at the front and over the register was a gigantic sign with an arrow that said, register. Where to pay? People would go stand at the other side

Starting point is 00:43:32 of it with their CDs in their hand, willing to buy it. You're like, it's over here under the big sign. You can't pull that out. Someone's not doing a health check. What do you say? How do you deal with that? That's something you can't program.

Starting point is 00:43:48 I mean, fortunately and unfortunately, right? Because if we could be programmed, well, that's a whole different debate we won't get into. But if we can be programmed as easily as computers, then we wouldn't be human, I guess. Yeah, and I have to say, people are people. And from time to time, people are so and and i i see it every

Starting point is 00:44:09 day at the moment because in front of my of my window there are rebuilding a bridge and they close the the streets around it and you will not expect how many people are driving inside of the construction even big big uh trucks and then trying to turn around and so on. But it says, hey, stop here. The road is closed, but people are still driving through the construction site. And it's the same with developers. It's the same with administrators. It's everywhere the same. And I'm not better. No, no, no no not at all we all

Starting point is 00:44:45 we all make mistakes it's good to hear about them though yeah and then also um there was this episode i cannot recall her name but um cd uh version two right and this is cool there there are a lot of cool concepts and and And I got her point when she said, okay, we only need one, or that people are thinking we only need one big Kubernetes environment and throw everything, every, I don't know, development integration or production pre-prod environment

Starting point is 00:45:17 into one Kubernetes cluster. But as Andreas mentioned in this episode, how do you validate if you want to, I don't know, upgrade your storage provisioner inside the cluster? If you want to upgrade your ingress controller in the cluster, that it will work. So these are central components of your Kubernetes.

Starting point is 00:45:38 Or if you want to upgrade the Kubernetes version at all, I would say I'm more on the side to test it in advance on another system before I'm running it on production. And, you know, it's the whole shiny new world. It's blinking around and everybody wants to use it, as you already mentioned. But it's not everything gold right yeah and i think this this whole thing with with the you you cannot run everything on one cluster

Starting point is 00:46:16 this can it actually came from the session we had with kelsey hightower i believe or at least i heard it from him that that you always need to have environments you need to have stages because you need to test these changes on the underlying platform. And you cannot just do everything in production. Considering the time, because we have a hard stop with the recording in a couple of minutes, Christian, I want to do one quick thing with you. If somebody starts learning Kubernetes today or tomorrow, because it's late here, but what are the five terms? Because every technology comes with new terms, right?

Starting point is 00:46:51 What are the five terms everybody needs to understand so that they know what this is all about? And I start with one, because this is so they know what I'm getting at. Mine is kubectl or kubectl. Everybody needs to know that this is the primary tool that you interact with Kubernetes. It's a command line interface to Kubernetes. What else do people need to know about

Starting point is 00:47:11 when they hear it the first time? So I would say the Kubernetes objects that they all like. What is an ingress? What is in service? What is in deployment? What is a stateful set, for instance? What are operators. So it's a basic terminology and also how in a Kubernetes cluster you can traffic,

Starting point is 00:47:37 traffic is flowing around. So how to reach other services, for instance, over the internal DNS, like here. You know what I mean? Yeah, this is what I would recommend, at least that the people know about, right? And yeah, as you said, kubectl and also how to monitor their deployments. Well, that's easy. Deploy one agent, one agent operator. And I have to say

Starting point is 00:48:19 this was one of the biggest things I was yeah. So when it comes to monitoring, and I saw that Dynatrace released the one agent, I was really happy about it because before it was kind of how to use Prometheus, what kind of metrics do I need to pull from the API server, whatever it's. And with one agent, it's, yeah, it's perfect.

Starting point is 00:48:45 Can I throw another question? Not a question, but to this idea, Andy, that you're bringing up there. Now, again, I'm only in theory. I'm on the pre-sale side. I'm not dealing with this in real life. But based on conversations we've had with, let's say, Kelsey and some others, would it also be a good idea for anybody moving into Kubernetes to be able to answer why they're using Kubernetes? Like in one or two paragraphs or less,

Starting point is 00:49:11 why are you using Kubernetes? And if they can't answer that, don't start, maybe. I mean, that's a little extreme, but... And I think this is actually a great point at the end of Christian's presentation, the last point. Think about on how you want to deploy your apps before you start using Kubernetes. So think about this first, so not the reverse, but really before you even get started, answer that question. Yes, it's like I said before, you have to think about what workload you want to run on Kubernetes, what you want to achieve, what do you think is the advantage you want to utilize from Kubernetes

Starting point is 00:49:49 to run this workload on, right? All right, Brian, I know we'll have a hard stop. That's why I think we want to kind of conclude here. And Christian, it was a pleasure having you. I know this is not going to be the last. I know we also have a lot of other things planned we will be speaking at the redhead summit and redhead podcast they've also invited us so that's going to be great i know you are i always keep telling you start saying no

Starting point is 00:50:17 at some point because you have a lot of work to do in your regular life but i'm still happy that you often say yes to when we ask you. So thank you so much. You're welcome. You're welcome. And it's a pleasure to be on a show where people like Cassidy Hightower and so on were guests as well. But do you have a pair of sneakers here? That's the big question. Not from the Perform 2021.

Starting point is 00:50:39 I have my pair of sneakers from 2020. So you at least have a pair. Okay. But I'm also wearing and, and the podcast listeners will not see it, but we can, we can take a screen. You've got the nice,

Starting point is 00:50:51 you've got the nice Dynatrace socks as an employee. I got the crummier ones. I got the, the $2 version. You have it? Yeah, I got it. Those are awesome.

Starting point is 00:51:00 All right. Well, thank you very, very much, Christian. Do you have any social media you want people to follow? LinkedIn, Twitter, anything that you're spouting off all of your brilliant observations? Or do you have a LinkedIn? I have a Twitter account with a very professional handle. It's called at Wurstsalat. At what a lot?

Starting point is 00:51:24 Wurstsalat, sausage salad in German. Oh. That's great. And then don't expect any let's say useful content there. Okay. All right. Really, thank you for being

Starting point is 00:51:40 on. Andy, was there anything else or should we go ahead and wrap up? No, I think I'm good. Thank you so much, Christian. That's really it. I'll see you soon. If anybody has any questions and comments, you can reach us at pure underscore PT on Twitter or send us an email at pureperformance.dynatrace.com. Thank you so much

Starting point is 00:51:56 for listening, everybody. And Christian, thank you so much for being on. This was very enjoyable. Andy, as always, thanks for being awesome. Bye, everybody. Bye-bye.

Your Ad Here

PurePerformance - How not to start with Kubernetes – Lessons learned from DevOps Engineer Christian Heckelmann

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.