PurePerformance - Encore Presentation: How not to start with Kubernetes – Lessons learned from DevOps Engineer Christian Heckelmann

Starting point is 00:00:00 The following is an encore presentation of Pure Performance. Andy and Brian will be back in 2022 with new episodes. Thank you. It's time for Pure Performance. Get your stopwatches ready. It's time for Pure Performance with Andy Grabner and Brian Wilson. Hello, everybody, and welcome to another episode of Pure Performance. My name is Brian Wilson, and as always, I have my co-host Andy Grabner today.

Starting point is 00:00:42 Andy, how are you doing? I'm really good. You're so polite all of a sudden you made some terrible jokes earlier about me and and what's wrong what's what's happened why are you polite all of a sudden repentance i feel bad okay plus the you know the the people on the other side of the of the speaker they only have to know about the our beautiful relationship they don't have to know about the dark side and all that. Not the dirty secrets, yeah. Yes, yes, yes. That'll come out in 10 years from now when we're both on Skid Row with bad drug habits.

Starting point is 00:01:13 It'll be like how it fell apart. Anyway, speaking of things falling apart, I'm going to try an anti-segway. Speaking of things falling apart, deployments fall apart all the time. New technologies fall apart all the time. New technologies fall apart all the time. A lot of times people go ahead and try to use the fancy new toy and there's no good guidelines on how to get started, no good

Starting point is 00:01:36 ways to say, hey, what could we have done if we only knew or had some experience to build on? Am I going in the right direction, Andy? Do you want to save me? I'm not going to save on. Am I going in the right direction, Andy? Do you want to save me? I'm not going to save you. I'm just continuing your thoughts. The latest new cool technology, as we all know, is Kubernetes.

Starting point is 00:01:53 We had several people on the show lately talking about Kubernetes and what it all allows us, kind of getting us to the promised land of enabling every developer to deploy to production at any point without any problems because it's endless scalable, it performs well, and you don't have to think about anything else other than writing a couple of lines of YAML code. Just YAML, yeah. Just YAML. That's the only thing.

Starting point is 00:02:17 One and the one. But the thing is, no, so for real, we're all moving to that direction. Most of the listeners are probably already in Kubernetes. Some will have to learn it. And there are happy moments and sad moments. And we want to make sure that the happy moments, that you will have more happy moments than sad moments with Kubernetes. And this is why we brought Christian Hecklman on the podcast today. Christian from ERT, who been uh i think christian first of all hi i want

Starting point is 00:02:48 to give you the word hi hello hey christian we have been talking we've been working together for you know a year or two or even more i don't remember i think it was barcelona in yeah yeah so so it's with with all the covet around uh you cannot tell what year it is right now right exactly and you've been a part of the journey with uh you cannot tell what year it is right now. Exactly. And you've been a part of the journey with, you know, obviously Dynatrace and with Keptn. But what really today is not about Keptn, is not about Dynatrace. It's really about your experience and your work that you are doing in adopting Kubernetes for the company you work for. And the things you have learned, the things you love, the things you don't like so much, things you would have wished you

Starting point is 00:03:28 would have known before you get started. And you did a great presentation, which we hope, which we're probably going to share. You titled it, how not to start with Kubernetes. A lot of great wisdom, a lot of great memes in there. And it's more kind of comic book. It's yeah. Which makes it more enjoyable to learn and work with Kubernetes. And but in all seriousness, before we get started, Christian, can you give us a little background to yourself?

Starting point is 00:03:54 Because people can then better relate on who are you? Where did you come from, especially professionally, so that they know how you are, why you're actually transitioning over to Kubernetes and why you're now what are you doing right now with Kubernetes? So, yeah. So, first of all, my current position is a DevOps engineer at DRT. So, a couple of years ago,

Starting point is 00:04:16 I didn't have anything to do with Kubernetes or Docker at all. So, I was a database administrator. And before I was a database administrator for round about six, seven, eight years, I don't recall at the moment. I was in charge of the network of my former company and before then I was kind of field engineer outside. And so when I started so that my second time I'm working for ERT. And when I started at ERT, we didn't have any Kubernetes, any Docker environments or something like that. We were

Starting point is 00:04:54 deploying our services via Puppet or on IaaS with WebDeploy, et cetera. And I think it was three years ago or something like that. It was Kubernetes 1.9 back in the days. We started to, yeah, having a POC because one of our architects were asking us, so, hey, do you want to start building something in Kubernetes because we want to utilize this as a microservice platform and so on. But there were also competitors there on the market like Docker Swarm at this time and Mises, for instance, and we did a comparison. And that was the first mistake we did that we did this kind of submarine project i will call it in german i don't know if

Starting point is 00:05:47 there isn't the same in english um yeah because i'm not a native speaker and um we started that or i started or i my task was was then um to install a k cluster, which they can play around our developers. And another guy was taking the Docker Swarm part. And there the whole journey started, I would say, with Kubernetes and our company. And as I said, that was the first mistake, not to bring everybody on board building the platform because building a Kubernetes cluster or, yeah, installing Kubernetes clusters is not the problem, but maintaining it, thinking about all the other stuff like networking, storage, and so on and so on and so on. This is quite a hard learning curve,

Starting point is 00:06:41 at least from my perspective. And so, yeah, so i started yeah going to kubernetes io or what the domain is and then reading through the docs and how to install kubernetes and i've provisioned myself three uh linux vms and start with cube adm and and all the stuff right i'm basically Basically building Kubernetes from scratch. And it was kind of, okay, it's working. And then we thought about, okay, how are we getting our traffic inside the cluster? And it was kind of, okay, Googling and then searching by myself. So if you're starting,

Starting point is 00:07:19 I would always recommend if you start using Kubernetes, get help. If you don't have the knowledge in your company, get external knowledge consultants or whatever. And start working with them. Because a lot of mistakes I made in the past were just because I didn't know it better. Or you couldn't find in the past were just because I didn't know it better or you couldn't find in the internet any better, uh, um, solutions. So yeah, that's quite a big challenge running, running Kubernetes, especially if you're running it on-prem, right?

Starting point is 00:07:58 So running Kubernetes on premise pain in the knee. I just wanted to ask a question about that cause it's an interesting topic. Andy, as we, we often discuss with guests, a lot of times they're experimenting on your own just like you are, Christian. And it's hard to get by until you can prove something out. So what you just proposed there about getting external help, at least if it's paid, is sort of one of those chicken and the egg problems. Because how are you going to get budget to get external help if you haven't proven something out, maybe haven't even gotten buy-in yet from your management?

Starting point is 00:08:32 Obviously, we've always seen success cases on migrations have been with upper management buy-in. But if you're doing this from the ground up in the submarine project, how would you recommend somebody goes about, are there free external resources that you're aware of? I know you mentioned something about the internet. At this point now, as opposed to when you started, are there some resources people can leverage

Starting point is 00:08:56 that they wouldn't necessarily have to go to a paid level yet that they can start at? Or how would someone tackle that from your point of view, Christian? So when you're using, or when you want to deploy Kubernetes, there are great provisioners. So tools you can use to start with, and which will give you the ability to spin up a Kubernetes cluster,

Starting point is 00:09:19 a quite stable one, without the deep knowledge. Like I stumbled across Rancher, which is an open source project, but even though they're buying or you can buy from them professional support as well. But if you want to provision some clusters, yeah, they have a quite good documentation on how to do stuff like configuring ingress and load balancing on your cluster and so on and so on and so on. So I would recommend to do something like that or using other tools like HOPs, VMware Tensu or whatever, what your company is able to do. but it's still like when you're searching in the internet, you find for one Kubernetes setup

Starting point is 00:10:10 or what you want to achieve, tons of different documentations and guides. And when you're taking a look at the CNCF landscape, what I'm always doing if I need to bring in something new, like new storage provision or whatever, I'm looking at the CNCF landscape and looking there what tools are available, which are not officially supported, but which are projects within the whole Kubernetes ecosystem, right? Not relying on some weird GitHub repos with some scripts in it.

Starting point is 00:10:48 That's basically what I can tell the people. And even though if you're starting with your Kubernetes, you have to think about what kind of workload you want to run on your cluster. It's not only here building a cluster and then shifting everything from your big monolith into a container and then throw it into the cluster. I think this will make you not happy to operate the cluster. Because you will find a lot of side effects here and there. One of the resources that I found recently is it's called Tech World with Nana.

Starting point is 00:11:26 She is an amazing YouTuber on technology topics. She's actually in Austria. She's from Vienna. And I reached out to her and Brian, we may have her on one of the upcoming podcasts, hopefully once she has time. But she did an amazing set of tutorials and she has like several hundred thousand views on these tutorials. So a lot of them are free. She did an amazing set of tutorials, and she has several hundred thousand views on these tutorials.

Starting point is 00:11:47 So a lot of them are free. Most of them are obviously free on YouTube, giving you a four hours or an eight hours course on Kubernetes getting started with it. So I suggest folks will add the link to this as well. But Christian, what I hear from you is, and this is also part of your presentation, right? Don't start with installing Kubernetes from scratch.

Starting point is 00:12:08 By now, time has passed. There's many options to really get started with Kubernetes. And maybe even you just go to your favorite cloud provider and then you just spin up a managed instance there. Yeah, absolutely. But even though if you're starting on Kubernetes cluster, if it's a GCP or an EKS cluster or what it's called in Alibaba cloud, I don't know.

Starting point is 00:12:35 If you're, I would say, if you only have to support one kind of application or your company's really small and you don't have any, or not any not a lot of dependencies across your environment, like services needs to talk with services on your on-prem environment and a lot of routing and so on, then spinning up an EKS cluster is it's really easy, right? But as soon as you have to connect all the services with your legacy applications

Starting point is 00:13:07 and so on, then you still need to figure out, okay, how I'm sizing my VPCs, how I'm doing routing there, how I'm doing security, and then all of the stuff you need to think about. While of course, something like load balancing. so one of the things i learned was uh running kubernetes on-prem if you want to run load balancer you first need to have something like metal lb in place so so you can have a floating ip around your cluster and then setting on top an ingress controller to to distribute the traffic inside of your cluster. And if you're running this in, let's say, a managed system, then you have to think about the costs as well, right? So, sure, you can install an AWS, the AWS LB controller,

Starting point is 00:13:57 and every time you're deploying the service, it will spin up a complete entire load balancer for you, which is great. But if you're having a couple of hundreds of services running on your cluster, then it will be getting quite expensive. Having 400 load balancers here in place, and then you should think about,

Starting point is 00:14:17 okay, I need an Ingress controller for them because not all services needs to have their own load balancer. And so on. So there are still considerations you need to make when you're running on a cloud, but it gets getting much easier because a lot of issues I had was just because of the fact we didn't know how to configure

Starting point is 00:14:40 or having best practices in place for your worker nodes. Let's say some system settings on your, in our case, SendOS machines, right? Which was, yeah, so little story here. I had the issue that with SendOS and XFS file system, the SLUB unreclaimable cache was filling up and worker nodes were crashing out of sudden. And this was all because of a lot of mounts which were made from one of our deployments.

Starting point is 00:15:17 And we didn't know why. So you have to dig into the kernel crash files and then reading a lot of kernel bugs and and so on to find out okay when i change the file system to x4 we don't have this issue anymore and and this is what you don't have to think about when you're running on an on an eks cluster or a managed cluster right but you still have to think about, yeah, other stuff you need to consider, like load balancing I mentioned before, right? Yeah, so, I mean, you bring up a lot of points that,

Starting point is 00:15:55 I mean, I didn't run through into each of those because I'm not as deep down into the weeds and having to provision and manage these Kubernetes clusters for a whole organization. But just for the work we've been doing with Keptn, where we have been providing different ways to stand up Kubernetes clusters, whether we give people instructions on how to use EKS and AKS or something like that, that's kind of easy. But then people are not familiar with Kubernetes and maybe they want to run it on a K3S, as you know, right? They've been using keys or micro Kubernetes.

Starting point is 00:16:28 But all of a sudden, I think this was my wrong perception was I thought I run a comment and have a Kubernetes cluster and all of a sudden I can just deploy my apps and I don't have to take care of anything else because Kubernetes takes care of everything. But then I realized that I have to understand what is an ingress? What can I configure on an ingress? The top most question I get now on the Captain's Leg channel is, how can I make sure that my ingress is exposing my services to the outside world with SSL, with TLS?

Starting point is 00:17:03 And these are all things that I, coming from my background as a developer, never thought of. I never had to think about networking. I never had to think about security. And all of a sudden, if you really want to use this platform as a quote-unquote self-service magic tool, you have to take care of this, right?

Starting point is 00:17:19 And this is where I think some of the misconception always comes in. It is challenging. Yeah, it is challenging. And when you mentioned the developers, so developers should focus on writing code and not think about how networking is being done and so on and so on.

Starting point is 00:17:39 But you need to understand the basic concepts of Kubernetes, how, for instance, you're calling another service within your cluster right so um what i always see is um uh developers are provisioning or you're installing a service deploy a service in the kubernetes cluster defining an ingress and then calling another service through the ingress so the traffic is flowing outside of the kubernetes clusters to the public internet back to the kubernetes cluster which are basically uh yeah it's it's it's latency right for your service well latency and costs and costs yeah yeah absolutely and i also had no clue about right i mean again when i started and now a year and a half into it i if i think you're completely agree with you next Next time when I would start

Starting point is 00:18:25 with something like this, I take the time and sit down and go through documentation, some training, because you can avoid a lot of basic mistakes, like exactly this, and you are not aware

Starting point is 00:18:35 of these mistakes. Or like that, you know, you talked about the load balancers. If I spin up an EKS and I just say load balancer ingress oh that's awesome everything works fine and then you get the ability at the end of the month yes exactly exactly and and and then also other mistakes um so so there's there's one slide in my presentation where i'm is this oprah meme with everybody gets admin, right?

Starting point is 00:19:06 This was in the beginning. I hadn't had a clue about RBAC and all the security stuff in Kubernetes. That was something I learned afterwards, after developers deleted entire namespaces and environment-based namespaces, not application namespaces, even environment namespaces, and how to prevent stuff like that, or how to, let's say, if you forget in your ingress annotation, in our case, the host field,

Starting point is 00:19:37 it will create an asterisk, a wildcard ingress for you. So out of sudden, every traffic was routed to the one deployment, and every other was yelling, what's happening? When I'm calling my URL, I'm getting the other service, right? And so this was, I could mitigate this using the open policy agent, for instance, but also RPEG you need to think about and and um yeah don't give everybody admin access on the server right so yeah because you would also not give everybody

Starting point is 00:20:14 full access to all the production servers all the time yeah absolutely and i think this is the challenging again this is kind of the the balance we want to we all preach at least you know in the in the work i do we preach about autonomy we preach about how can we make how can we give everybody more responsibility but still giving them enough guardrails so that they cannot make mistakes and i think this is the challenging thing now with kubernetes also to find out it's a it's a great platform if you use it right but also it's such a huge platform that not everybody should have access and do everything but you have's such a huge platform that not everybody should have access and do everything but you have to have a basic knowledge and you have to figure out what makes sense to give into the hands of certain people and what doesn't make sense

Starting point is 00:20:53 where do you need to provide some processes using tools like continuous delivery tools that can then automate certain things right yeah so so a big part is documentation. To have documentation for the developers in place, to have templates in place they can reuse. Like I created a Helm template, for instance, which was basically covering most of our deployments and doing stuff like, or preventing stuff like accidentally exposing the service to the internet when it's not needs to be exposed to the internet, right? Or setting resource limits by default on the deployment. Because what I've seen, or this was also a big learning thing for me that dealing with resource limits and requests in kubernetes so um

Starting point is 00:21:50 on our first cluster i was always wondering okay why is notice exploding why it's not working anymore what is happening there and then i figured out okay so the pods are utilizing too much space or or, too much memory. Java applications are quite memory consumption, very high, right? So what I did is, first of all, setting resource limits on namespaces, so they cannot overutilize the RAM or CPUs, and also building this into our template. So this will be not forgotten anymore when they're deploying something.

Starting point is 00:22:32 Right. And yeah, stuff like that. So documentation, trainings for the developers when they want to start using Kubernetes, because building a Docker image is quite easy. Deploying something to Kubernetes is quite easy, but does it always make sense? It's the other question, right? Yeah, that's actually a good one. So is Kubernetes necessarily by default the right choice

Starting point is 00:22:58 for any type of app? Or might it be better to think about other platforms in the future? Or I don't know, it could be that you're just deploying a container on an ECS or Fargate, or maybe it's serverless is better, or maybe it's just an old-fashioned VM and you just run it somewhere. Yeah, so the best example I have here is websites, static websites. So why you should deploy a static website as a container or deployment in

Starting point is 00:23:27 kubernetes when you can put it into a three bucket and you're good to go right that's a running container do you run do you run regular training sessions like how does this work do you do regular training sessions that you wish to have regular training sessions with developers or how does this work yeah so so um from time to time when i'm seeing there's something popping up like hey we want to deploy something to kubernetes then i'm reviewing the application or what they are doing there, giving guidance, having trained. It depends on the knowledge of the developers. But in the end, it all comes to governance, that when somebody is writing new service, that architect should look into it, should decide,

Starting point is 00:24:17 okay, it doesn't make sense to run it on Kubernetes. I'm not using Kubernetes only because you can spin up an environment very fast and then without having yeah to tell anybody else right because what i've seen um when we started using kubernetes that every developer was throwing its application into a container and on the cluster because they don't need to to um go over the the um and i don't know, the hurdle. The hurdle, yeah, whatever. So I'm trying to, I apologize, to create a merge request

Starting point is 00:24:52 in our Havotira configuration, right? So, and they use Kubernetes to bypass this, even though if the service doesn't make sense at all to run in Kubernetes, right? You know, what I'm hearing a lot of in this conversation, and thank you both for, you know, I've been being quiet because I've been trying to listen and learn. It sounds as if there's a disconnect between the general marketing of Kubernetes, and I'll use that loosely because there's no Kubernetes company per se, right? But the general marketing of Kubernetes as pop it in, everyone can be self-sufficient and things just run smooth and easy, right?

Starting point is 00:25:35 Which most of us at this point know it's not as simple as that. But Kristen, as you mentioned, things mentioned things like training like documentation and all these things that for decades people have been trying to move away from um we're obviously not moving anywhere now the way i see kubernetes and i don't mean to say that kubernetes is a loss right because kubernetes is opening a whole new world for great other things. I think there's just still a lot of overhead and maintenance, scripting and all, but a different type. And the benefit of Kubernetes is that,

Starting point is 00:26:14 as opposed to, let's say, in the old days, where you had to stand up your servers, you had to have a physical location to run it yourself, you couldn't just pay someone else to do it easily, you had to pick your network, set up your network, do all this other kind of stuff. You can now spend time automating a lot of your deployment process, which gives you back that time to do these things

Starting point is 00:26:40 like the documentation, the training, and these other bits that are still going to... Some of these things are not going to go away. We still have, we're still humans. We're still stupid, meaning we don't have these things programmed into us. And if, you know, program Andy crashes, we can't rely on program Andy being there. So this has to be documented. So I guess what I'm trying to say in a long, long about long-winded way is that while not all the things that we all hate about traditional setups go away, the critical functions still remain. Yet, because of the advancements in technologies, this gives us the ability to automate and push past a lot of these other things that we would normally have to do in tandem with the documentation and everything else. So it's not a net loss.

Starting point is 00:27:25 It's still a net win. But I think most people have to come to the realization that this is not some magic fairyland where if we think back to Cloud Foundry, the idea is here's my code. I forget what the haiku was, Andy. Here's my code. Deploy it.

Starting point is 00:27:39 I don't care where. You still have a whole team of people maintaining that Cloud Foundry thing, but at least for the developer on that side, it's a bit more abstracted. Here, as you're saying, there are a lot of things you still have to know and learn. I guess just bursting the bubble that it's not

Starting point is 00:27:55 just plug-and-play. Did plug-and-play ever come to fruition? Remember way back when plug-and-play came on? Or am I dating myself? It's still going to be tough. There's still a lot of things and that's why i think this guide that you put together so awesome because it's covering a lot of these things and earlier when i was you know talking about are there any resources out there uh for people although this isn't like a definitive resource this gives you a lot of things to say hey what are some check marks we should go through

Starting point is 00:28:21 you know um anyhow i'll shut up now i've been rambling no that was good and and i think so so so what i i already told andy yesterday um so when when you're building a kubernetes environment or kubernetes cluster it's not only the platform the developers are using you're building that in the end you're building a small data center within your data center or within the cloud, right? With all the different dependencies like networking and so on, storage, blah, blah, blah. And you need to think about the stuff when you, and you need people who are taking care of the stuff as well. I mean, I had a little smile on my face when when google announced their

Starting point is 00:29:06 autopilot feature of kubernetes last week or this week i don't remember which are yeah taking away a lot of administrative tasks but also saying okay you can now only use this kind of networking provision or or cni plug-in or whatever Well, it's basically, in the end, you can only control the complexity if you're reducing the number of potential combinations that make it so complex, right? That's why it becomes an opinionated platform. And we're back to Cloud Foundry, as I mentioned once

Starting point is 00:29:39 in an episode several months ago. Yeah, and I think that's... They had an idea there. Yeah, And I think that's, they had an idea there. Yeah. I completely agree with you, Brian. And I think this is also what we all try to do, right?

Starting point is 00:29:51 We try to leverage the new shiny thing, but then we learn while it's great and powerful, it in the end doesn't make us more productive or at least not the developers. Therefore we need to come up with a very clear, defined, prescriptive, opinionated path of doing 80% or 90% of the work. And for the rest of the 10 remaining percent, yes, we may then need to go and look into some other options outside of our opinionated way. But I agree with you.

Starting point is 00:30:23 This is also why we invest so know so much on on standardizing things whether it's open telemetry whether it's the stuff and now i say it with captain and we try to we it's all about making in the end making it easier to get work done on top of something that is very complex but we also have to narrow down the complexity because we can, this is also why if you look at what we ended up doing with Captain, we are really narrowing it down now to say, you know, if you want to try Captain, then you take a Kubernetes cluster and we want you to use, let's say, you know,

Starting point is 00:30:59 Istio as a service mesh and we just give you sort of, in the first iteration of Kepton, we said it is deploy, test, evaluate. This was super easy and super clear. Now we went a further step because we had people that say, well, we need more flexibility, but still by default, we give this prescriptive approach and say, this is how we think you are most productive in deploying this particular type of technology or app or service. And if you use it for 80% of the use cases, we think you're

Starting point is 00:31:32 going to be fine. And for the 20%, you can turn some knobs and you can change our opinion to be closer to your opinion. And I think, Brian, we had a similar discussion in one of the episodes we recorded that hasn't aired yet as of today, but I think it was with Baruch, if I'm not mistaken. Oh, yes, the Liquid software. The Liquid software, same thing. Yeah.

Starting point is 00:31:58 By the time people listen to this, it will have aired. It's the previous episode, but yeah. Christian, I know in the end of your presentation, and we will share it, you have a nice conclusion. And as I think Brian highlighted before we started the recording, there's a nice meme on there. It says the H in Kubernetes stands for happiness. But you have a nice summary of points of things that you want to be careful with.

Starting point is 00:32:34 Don't deploy a production cluster without a review of a professional. That makes a lot of sense. And 20 people, we already covered that. The templates, I know you covered it slightly, but I have a question on templates. So if you provide templates, but you allow people to modify the templates, such as templates, but if you don't enforce them,

Starting point is 00:32:54 do templates alone help you? Do you need governance or something else on top as well? Because otherwise people can do whatever they want with the templates. So when you are having a lot of microservices and a lot of developers who are adjusting templates and so on, it's quite hard to get an overview of what they are doing there. So at least that's my, what I've seen, what is happening. But it could be in other companies uh yeah another

Starting point is 00:33:26 thing right so so a lot of companies who are only deploying one set of software is is different than having a lot of different systems flying around legacy systems and then want to transition to kubernetes but um um in my opinion you still need to have some kind of governance who are looking on new services, new deployments, which are then going to Kubernetes clusters. So developers could play on one cluster if they want, right? And nobody wants to restrict them into playing around with stuff. But as soon as the service is getting promoted to a higher stage, to an official integration environment or whatever, then somebody

Starting point is 00:34:11 should really look at what they are doing there and how the service is working. Normally, in my opinion, this is classical task for the software architects right so they they refuse service what the service doing uh what the communication of the service is and so on yeah um another thing that i want to ask you now you you mentioned earlier some of the things that happened and that shouldn't happen like no resource limits you mentioned access control and somebody was accidentally deleting all the namespaces

Starting point is 00:34:52 any any other horror story or oh a lot a lot so so um i think i've I've found every, yeah, fettnäpfchen. Yeah. And here again, a German word, I don't have it. Was that farfignutian? Fettnäpfchen, yeah. If you step, it literally translated, it means you're stepping into a puddle of fat. That means you make a bad step and then you do something that you shouldn't do, right? Are there puddles of fat that means you make a bad step and then you are you do something that you shouldn't do right what are there puddles of fat sorry are there puddles of fat laying around in germany that

Starting point is 00:35:29 this came from like anyhow i'll look that up i'll look that up how that came about this is fascinating how that term would come about yeah a lot of vice was lying around the bavaria um yes yeah sure so so for for instance a classical um uh classical thing is tagging of your Docker images and using something like tag latest. So if you, I can remember I was searching around to get a storage provisioner, right? So I used in the beginning of our first or second cluster a cluster of s combined with the caddy to provision persistent volumes and yeah it's a classical copy and paste and yeah it's working we have persistent volumes but i didn't yeah i realize that the deployment of Hiketi was using the tech latest. And node goes up, node goes down.

Starting point is 00:36:31 And all of a sudden, it was using, at the end, the image pull policy always, which means every time the pod getting restarted, it will be pulled in the latest image from Hiketi. And all of a sudden our storage provisioning was not working anymore and was kind of okay what's going on here and then you have to dig into the problem and okay why it's not yeah working anymore and so on and so on and even um developers were using for instance um a heavily used image you know it's alpine and they're using alpine latest and i can remember on in barcelona at the perform on one day um there was an um vulnerability found in the

Starting point is 00:37:16 alpine image for uh i don't know empty root password or something like that and they updated the the alpine image and i was getting yeah pinged by everyone our company up a lot of people kind of hey my service isn't working anymore in kubernetes kubernetes down kubernetes down uh help help um yeah world is on fire and i was kind of yeah but your pot isn't working at all right it? It's, what should I do? Kubernetes is working, so your pod isn't starting up. And have you tried to run the pod locally, for instance? That's also something I've seen a lot,

Starting point is 00:37:54 that developers are just using the CI, CD tools to build their containers and throw it in the cluster and really developing inside of the cluster instead of trying, okay, is my Dockerfile building my application? So I have a lot of tickets regarding, hey, my deployment isn't working yet, but your Dockerfile could not be built. So try it locally, fix it locally or then you can can avoid this turnaround times for for fixing stuff like that and especially because then you become the bottleneck

Starting point is 00:38:30 and you deal with things that you shouldn't deal with because these are basic things that should be checked beforehand absolutely absolutely yeah i mean for the latest uh like latest tag, isn't that something where OPA comes in? The Open Policy Agent that can be used for that to validate no latest is used? Yeah, sure. Sure. And I think it should be possible with OPA.

Starting point is 00:38:58 OPA, or how it's pronounced in English. I don't know. Open Policy Agent. OPA. And you know O Open policy agent. Opa. And you know Opa is German and means grandfather. Yes, my mom had a friend who was a grandmother.

Starting point is 00:39:15 Her German... I knew this from that, but I forgot what it was. I just know I heard it. I'm just pretending to be cool with you guys. But I have a question. Do you also say I forgot what it was. I just know I heard it. I'm just pretending to be, I'm pretending to be, you know, cool with you guys.

Starting point is 00:39:27 Yeah. But I have a question on the Christian. Do you also say Oma and Opa? Do you say Großvater and Großmutter? I'm saying Oma, Opa. Okay. But it depends on the region in Germany.

Starting point is 00:39:40 Yeah. Same here. Same here. I guess there is, I mean, Oma and Opa is very Austrian or everywhere, I think, but. As you know, I'm, I'm. I mean, Oman Op is very Austrian, or everywhere, I think. As you know, I'm a little bit Austrian.

Starting point is 00:40:08 It's funny, because I know this total sidetrack, but I just have to say this, because a lot of times there'll be music that I like, and I always want to go to Andy with it, but I'm like, no, it's German. It's not Austrian. And I'm thinking like, well, is it close enough? And then I'm like, well, is Canadian close enough to United States? I'm like, yeah, totally different. They speak the same language, but it's different enough. So that's why I don't bother you with the German music I listen to. No problem. So I'm the master of sidetracks normally. But so speaking of this latest, just to clarify, right, because I'm pretty sure I understand what the deal is,

Starting point is 00:40:26 but for people who might not understand what people are doing, what it sounds like people are doing is instead of saying which version they want to get, because a lot of times when you're getting something from Docker or GitHub or something,

Starting point is 00:40:35 you just say latest, and that's a default tag that gets applied to whatever the latest push is, right? That's not something that people necessarily even put on their builds. But Latest will always just get the latest.

Starting point is 00:40:48 So the recommendation would always say, use a specific version that you want instead of just Latest, because obviously if you're always getting Latest as soon as someone updates it, you're going to get that new one and you have no idea what you're going to get. And another one, it's a security issue as

Starting point is 00:41:04 well. If you're pulling some images straight from Docker Hub, you are not aware of what is inside the image. Even popular public images couldn't be compromised. And so I always recommend to have a set of base images which developers can reuse in your own registry, which you have scanned and released for official use for your own registry, which you have scanned and released for official use for your developers. That's one of the things I would recommend instead

Starting point is 00:41:33 of running all this stuff. But as Andy asked me for other mistakes I've seen, so despite the fact, and I should call my Twitter handle grumpyadmin after the show, because it's, I'm only complaining and complaining and complaining, but, but Kubernetes is great. Kubernetes is great for automation and so on. But from time to time, you're thinking about, okay, classical, classical example.

Starting point is 00:41:59 Somebody is deploying an application and it's kind of, okay, I've deployed two pods and you kind of okay and where is your auto scaling configuration and kind of app i've deployed two pods it's it's high available and and i'm fine with that and you're kind of okay why you don't use all the stuff kubernetes is for right automatically scaling your deployment it's it's like deploying two vms in data data center and and you're good to go no that's not how it should work right and here we are again back on the on the training story to to yeah to something like that or health checks in the deployment if you're getting an email from

Starting point is 00:42:39 from one of the developers like hey can you restart my service in Kubernetes? You'll be like, what the... Right? Because this is core capability and exactly what's built. I understand. I think it's great that you're complaining because, again, we always hear about the positive side of Kubernetes, which there's a lot, right?

Starting point is 00:42:59 This is the real world. What are the admins dealing with? What are the things that people are still... What's the people factor, the human factor of leveraging? You can build in whatever guardrails you want. You can't, as they say, you can't fix stupid. So you still have that human factor that's going to do things wrong, not leverage what's built in there.

Starting point is 00:43:21 It goes back to way back when I had a job at a record store. We had a big suck because there was like this island in the middle of the store where we had like you know walkmans and radios that you could buy but then the register was at the front and over the register was a gigantic sign with an arrow that said register you know like where to pay you know the cash and people would go stand at the other side of it like with their cds in their hand willing to buy it you're like it's over over here under the big sign, right? You can't pull that out.

Starting point is 00:43:48 You still have. Someone's not doing a health check. Yeah, what do you say? How do you deal with that? That's something you can't program. I mean, fortunately and unfortunately, right? Because if we can be programmed, well, that's a whole different debate we won't get into. But if we can be programmed as easily as computers, then we wouldn't be human, I so yeah and then i have to say um people are people and then

Starting point is 00:44:11 from time to time people are stupid so and and i i see it every day at the moment because in front of my of my window there are rebuilding a bridge and they close to the streets around it. And you will not expect how many people are driving inside of the construction, even big, big trucks, and then trying to turn around and so on. But it says, hey, stop here. It's throat is closed, but people are still driving through. Yeah. Yeah.

Starting point is 00:44:43 Just construction side. And it's the same with developers. It's the same with administrators. It's, yeah, everywhere is the same. And I'm not better, though. No, no, no, not at all. We all make mistakes. It's good to hear about them, though.

Starting point is 00:44:56 Yeah, and also there was this episode. I cannot recall her name, but CD version 2. Tracy, Tracy, Tracy, Tracy Reagan. And this is cool. There are a lot of cool concepts. And I got her point when she said, okay, we only need one, or that people are thinking we only need one big Kubernetes environment. And through everything, every, I don't know,

Starting point is 00:45:23 development integration or production pre-prod environment into one Kubernetes cluster. But as Andreas mentioned in this episode, how do you validate if you want to, I don't know, upgrade your storage provisioner inside the cluster? If you want to upgrade your ingress controller in the cluster, that it will work. So these are central components of your Kubernetes. Or if you want to upgrade the Kubernetes version at all, I would say I'm more on the side to test it in advance on another system

Starting point is 00:45:59 before I'm running it on production. And, you know, it's the whole shiny new world, it's blinking around and everybody wants to use it, as you already mentioned. But it's not everything gold, right? Yeah. And I think it's this whole thing with the, you cannot run everything on one cluster.

Starting point is 00:46:24 This actually came from the session we had with Kelsey Hightower, I believe, or at least I heard it from him that you always need to have environments. You need to have stages because you need to test these changes on the underlying platform and you cannot just do everything in production. Considering

Starting point is 00:46:40 the time, because we have a hard stop with the recording in a couple of minutes, Christian, I want to do one quick thing with you. If somebody starts learning Kubernetes today or tomorrow, because it's late here, but what are the five terms? Because every technology comes with new terms, right? What are the five terms everybody needs to understand so that they know what this is all about? And I start with one, because this is so they know what I'm getting at.

Starting point is 00:47:07 Mine is kubectl or kubectl. Everybody needs to know that this is the primary tool that you interact with Kubernetes. It's a command line based on Kubernetes. What else do people need to know about when they hear it the first time? So I would say the Kubernetes objects are all like, what is an ingress? What is in service? What is in deployment? What is a stateful set, for instance?

Starting point is 00:47:32 What are operators? So it's a basic terminology and also how in a Kubernetes cluster cluster you can uh traffic traffic is uh flowing around so uh how to reach other services for instance over the internal DNS like the here uh you know what I mean yeah this is what I would yeah recommended at least that the people know about, right? And yeah, as you said, kubectl and also how to monitor their deployments. So you cannot deploy one agent, one agent operator.

Starting point is 00:48:26 And I have to say, this was one of the biggest things I was, yeah. So when it comes to monitoring, when I saw that Dynatrace released one agent, I was really happy about it because before it was kind of how to use Prometheus, what kind of metrics do I need to pull from the API server, whatever it's. And with one agent, it's perfect.

Starting point is 00:48:53 Can I throw another question? Not a question, but like to this idea, Andy, that you're bringing up there. Now, again, I'm only in theory, right? I'm on the pre-sale side. I'm not dealing with this in real life. But based on conversations we've had with, let's say, Kelsey and in theory, right? I'm on the pre-sale side. I'm not dealing with this in real life. But based on conversations we've had with, let's say, Kelsey and some others, right? Would it also be a good idea for anybody moving into Kubernetes to be able to answer why they're using Kubernetes?

Starting point is 00:49:16 Like in one or two paragraphs or less, why are you using Kubernetes? And if they can't answer that, don't start. Maybe. I mean, that's a little extreme but and i think this is actually in a great point at the end of christian's presentation the last point think about on how you want to deploy your apps before you start using kubernetes so think about this first so not the reverse but really before you even get started, answer that question. Yeah, like I said before, you have to think about what workload you want to run on Kubernetes, what you want to achieve. What do you think is the advantage

Starting point is 00:49:55 you want to utilize from Kubernetes to run this workload on? Right? Yeah. All right, Brian, I know we'll have a hard stop. That's why I think we want to kind of conclude here and christian it was a pleasure having you uh i know this is not going to be the last i know we also have a lot of other things planned we will be speaking at the redhead summit

Starting point is 00:50:16 and redhead podcast they've also invited us so that's going to be great i know you are i always keep telling you start saying no at some point because you have a lot of work to do in your regular life but i'm still happy that you often say yes to when we ask you so thank you so much you're welcome you're welcome and it's a pleasure to to be on the show where people like kathleen hightower and so on were guests as well but but do you have a pair of sneakers here? That's the big question. Not from the Perform 2021. I have my pair of sneakers from 2020. So you at least have a pair.

Starting point is 00:50:50 Okay. But I'm also wearing and the podcast listeners will not see it. We can take a screenshot. You've got the nice Dynatrace socks. As an employee, I got the crummier ones. I got the $2 version. You have it? Yeah, I got the crummier ones. I got the $2 version. You have it?

Starting point is 00:51:06 Yeah, I got it. Those are awesome. Alright, well thank you very very much, Christian. Do you have any social media you want people to follow? LinkedIn, Twitter, anything that you're spouting off all of your brilliant observations? Or do you have

Starting point is 00:51:21 a LinkedIn? I have a Twitter account with a very professional handle, Twitter handle? It's called at Wurstsalat. At what a lot? Wurstsalat, sausage salad in German. Oh, that's great. And then don't expect any, let's say, useful content there. Okay, perfect. All right, really, thank you for content there. Okay, perfect.

Starting point is 00:51:46 All right, really, thank you for being on. Andy, was there anything else, or should we go ahead and wrap up? No, I think I'm good. Thank you so much, Christian. That's really it. I'll see you soon. If anybody has any questions and comments,

Starting point is 00:51:57 you can reach us at pure underscore PT on Twitter or send us an email at pureperformance at dynatrace.com. Thank you so much for listening, everybody. And Christian, thank you so much for being on. This was very enjoyable. And Andy, as always, thanks for being awesome. Bye, everybody. Bye-bye.

PurePerformance - Encore Presentation: How not to start with Kubernetes – Lessons learned from DevOps Engineer Christian Heckelmann

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.