PurePerformance - Why GitOps is not Git plus Automation for Ops with Roberth Strand

Starting point is 00:00:00 It's time for Pure Performance! Get your stopwatches ready, it another episode of Pure Performance. My name is Brian Wilson and as always I have my most amazing co-host Andy Grabner with me who likes to mock me during these intros but none of you can see that. Anyway, hi Andy. Hi. You meanie? No, I'm not meanie.

Starting point is 00:00:44 I'm just worried now that at some point, because you're recording this, and I think you're recording it with the video, that at some point there will be leakage of the footage. And I just gave it, I'm so stupid, I just gave it the idea. I should go back. See, the thing is I delete all that stuff, though,

Starting point is 00:01:01 but I should just start compiling it and do a compilation of Andy making fun of me and doing doing his because you love torturing people as we were just i tried the word but i can't get the word um but anyway yes i butchered that i'm sure um yeah that's good wait a um gotta throw that in always uh anyway hey andy how's it been how you been doing everything's good. Yeah, that's good. Gotta throw that in always. Anyway, hey Andy, how's it been? How you been doing? Everything's good?

Starting point is 00:01:28 You good? You alright? Overall good, just a little bruised. Okay. I listened to Bruce Springsteen this weekend as well, so I guess that was bruised. Yeah, almost. No, I made a mistake on the weekend and I didn't admit it to my wife even though she gave me shit about it and now I admit it officially I know I should not go skiing and just believing that because the weather is warm I can ski in shorts and in short sleeves and I believe in that I'm so good

Starting point is 00:01:59 that I never crash and fall down but now I did crash and did fall down, and I have bruises all over my knees and my everything. So, Gabi, if you listen to this, you are right. Next time, I will wear it. That was part of my wedding vows, is my wife is always right. But yeah, anyway, I learned a really good way to start every day when people ask you how you're doing.

Starting point is 00:02:19 You say, I'm awake, and I'm not crying. So, it's always a good start of the day. So we'll leave it there. We'll leave it there. Now the question is, how do we find the segue now to introduce our guest? Well, you were skiing. I was skiing, yeah. You were amongst snow.

Starting point is 00:02:39 I remember seeing on YouTube, there's this live video that they run in the winter. It's called the North Pole Train. And it's a live feed of this train that goes all the way up towards the top of this country. Country being Norway. And they have snow. So maybe we do the snow connection? I don't know. The snow connection?

Starting point is 00:02:57 Well, maybe to the clouds. Because if you're in the mountains, you're closer to the clouds. As we move to the clouds, we have more things to basically configure and write down and store and version control. You just configured your skiing yaml. Correct. Exactly. Because I forgot my long sleeves. Hey, you know what? I think let's stop this because otherwise Robert will never get a chance to say something. Robert, thank you for being here.

Starting point is 00:03:28 Thank you for enduring these long two or three minutes. How are you doing? I'm doing fine, thank you. Just like a preface thing, I had surgery on Friday, so I'm also bruised. And I wish it was Bruce Springsteen, but it's nothing serious. It's removing a gallbladder, but any surgery can be painful. So I've been laying around basically since Friday, and today is Tuesday.

Starting point is 00:03:56 So I'm a little bit groggy, a little bit not up to my normal brain capacity speed, but we'll see how this goes. We added a disclaimer and we should say the words that came out of his mouth, he was maybe not fully under his mental control. I also have ADHD, so that is kind of like the normal disclaimer anyway. I just say stuff. Hey, Robert, coming back to the interview though, because I would like to learn a lot from you. So does Brian, I assume, because we always say the best thing of this podcast is we learn from our guests.

Starting point is 00:04:32 And we met, I think it's now a little over a year ago at KubeCon at a CNCF ambassador reception, which means we're both ambassadors. Yes, that is correct. How long have you been an ambassador? Well, they rebooted the program at that point, right? So I've been since the reboot. I weren't a CNCF ambassador before the previous program. I'm not totally sure how that actually hangs together.

Starting point is 00:05:00 So I applied for renewal. So that will be my one year anniversary, so to speak. Yeah, same here. Keeping fingers crossed that the renewal will actually go through. Yeah, well, we'll hope so. I've done a lot since then for the CNCF stuff. I'm also a Microsoft MVP, and unfortunately, I haven't had that much time.

Starting point is 00:05:21 So I'm a little bit more unsure about that one. But I've been there for three years, so hopefully the fourth year. Yeah, cool. We'll see. Yeah, and I'm not sure when this one will air, but probably it will air at around the time when the two of us will actually be in Stockholm together at the Container Days. Are you going to present?

Starting point is 00:05:43 Are you going to, what are you doing there? No. It's kind of funny going to these things when I'm not presenting because people always kind of assume that I'm going to do something but it's just I'm on the cloud native Nordics Slack and obviously the name implies that people from Nordics are there there so it's kind of like a collaboration we in norway uh sweden denmark and also uh iceland and so on and so forth we we kind of in finland we have a little bit of slack channel where we can share stuff and that event just popped up and i went like i haven't been to stockholm since well forever so i'll just go cool well then do me a favor don't boo me off stage okay

Starting point is 00:06:26 I will try to anybody who's there should bring rotten tomatoes and boo Andy please you just invited another bad idea but here's the thing

Starting point is 00:06:39 right I take this opportunity today to learn from you on some of the things I'm going to talk about. So that means if you are teaching me something today that the audience doesn't find useful,

Starting point is 00:06:51 then I can blame it back on you. Yeah, sure. Now in all seriousness, I know when we discussed this podcast there are different topics floating around. I look back in our chat history, you talked about open source and enterprise, platform engineering, Kubernetes, GitOps, and really, GitOps was the term that

Starting point is 00:07:11 I would start with because I believe while we had just an episode before this where we touched a little bit on GitOps and also open GitOps, I would like to hear your take on it and why you actually brought up the topic. Why is GitOps so important? What do people need to understand what it is and what it's not maybe? And also if people are considering going down that route, what does it really mean to implement GitOps in an enterprise especially? Yeah, so it's at some point I'm going to have an article about the importance of learning what terms mean.

Starting point is 00:07:54 Because it seems like we, or maybe it's the other way around, maybe we should try to make terms that makes more sense. Because we kind of just like slam together words and then people logically jump to conclusions based on those words. And then we get misunderstandings. Right. We have that for DevOps. You know, I usually say that, you know, DevOps engineer is the worst role ever because that can't be a role because DevOps is not a role thing.

Starting point is 00:08:21 You know, everyone should be doing DevOps. So that doesn't make sense. Same with Git ops. People have a tendency of hearing Git ops and think Git plus ops, which means just put stuff in Git and then automate it. And there you go. So they just using GitHub actions, Azure DevOps pipelines, you know, even Jenkins, whatever they do to automate things, their CI CD platform of choice. And then they kind of just like, all right, we'll, we'll, you know, cube CTL apply in pipelines, and then we have GitOps. But it's not it's, it's, it's a more specific term, which came originally from, from

Starting point is 00:09:02 a blog post by Alexis Richardson of, well, should now say probably formerly of WeWorks. The company that kind of coined the term, they kind of created the entire thing and we kind of been building on top of that now for several years. And the blog posts became kind of the basis of the GitOps working group in the CNCF under

Starting point is 00:09:27 tag app delivery, where we started creating the GitOps principles. And the GitOps principles were created to kind of have something very specific to point people to when people ask, like, what is GitOps? And the way that I see it, if you even have a like slight deviation of those principles, it's not really GitOps. Then you're using a different tool and trying to make it GitOps-like, but it's not actually GitOps. And those principles are, well, I can list them up. The first one is that it needs to be declarative. So it's all about desired state.

Starting point is 00:10:14 You want to define your desired state in a declarative fashion so you don't say do A, B, and C, again, which you kind of do with a pipeline, right? You define basically just shell commands, more or less, and that's very imperative. It needs to be versioned and immutable. So, you know, get every time you do a change, you add a new version of your state, you don't change stuff like add on, you have an entire new version. So you have something to go back to in case something doesn't work. You have something very specific in point towards and that actually kind of takes away the need to only do Git. It's not only Git that is

Starting point is 00:10:52 a versioned and immutable storage. So blob storage, Azure blob storage, S3 buckets, those are versioned and immutable. So you could use that as kind of like the storage for GitOps. And, you know, backtracking a little bit, we were kind of discussing when the GitOps working group started to change the name, simply because people actually just hear GitOps and kind of go, all right, put stuff in Git and then you do operations. It needs to be pulled automatically and pulled is kind of the key here. So with GitOps, you have something in your system that pulls in the changes. You don't push the changes in from an external source.

Starting point is 00:11:43 You point your system towards the source of truth, and it needs to monitor that. And then you have the fourth one, which is continuously reconciled, which is obviously then when it looks at the source, if it sees the change, it will always try to apply that change. So as soon as they have gated deployments and all those kind of things, that kind of breaks off the GitOps. GitOps is supposed to be automatic. As soon as you have a new version of your desired state, it should be automatically applied, more or less. Which is the point kind of when we start talking to people that come from a more traditional software development background,

Starting point is 00:12:26 for instance, you hear that just like as soon as something new is in Git that should be applied, they kind of get scared because they're used to having all these steps of making sure that everything is ready to be applied. But the point is all all those steps needs to come before you actually update your desired state um as soon you you need to test your code everything needs to be ready to be deployed you said test your code what are you talking about it's like a magical thing that some people do um so you you know i usually talk about CI CD, continuous integration, continuous delivery, and then you have continuous deployment, which is get ups. So the continuous delivery part of a CI CD pipeline is to have your application or service in a deliverable fashion. So it's ready to be deployed.

Starting point is 00:13:27 Then deployment should just work, right? And then obviously there's, you know, for very easy setups, you can just have that. But the more advanced setups you could have, progressive delivery tools involved, you need some sort of policy engine to kind of make sure that everything's all right. You need to have, you know, vulnerability checks, etc., etc., etc. So you kind of have to have a lot of tools at that point, but you should probably have that anyway, even though if you're manually pushing stuff into your clusters.

Starting point is 00:14:01 Can I take a lot of notes because I have a lot of questions and some of them I may want to challenge some of the things you said or at least understand it. So for instance, when you say you brought up progressive delivery, right? Classic example would be in the Kubernetes world, if you're using Argo rollouts, a blue green deployment, the Canary deployment. Canary deployment would be a great example of that. So rolling out version number two in a staged fashion. Now in GitOps, in Git, I would basically define my desired state

Starting point is 00:14:31 that I want to get from version one to version two in a Canary rollout. Actually, however, what's happening is that Argo rollout is then basically making changes to the replica set in order to get slowly to that desired state, I guess. So isn't that also a little bit contradictory? What does GitOps mean? It can take a little while until we get to the desired state.

Starting point is 00:14:58 And actually, what would happen if the rollout would fail? Let's assume version 2 is not good. Is the best practice to then actually open up a pull request and revert that Argo rollout definition back to version one? So I'm a Flux person. In the terms of Flux, we have Flagger, which does the same thing. And Flagger is actually part of the Flux project. So even though it has a different name, it's part of the same project. So if you take away the progressive delivery part and you update your desired state to use a version of your application that doesn't work, you would have a failure, right? And at that point you would

Starting point is 00:15:57 go, oh, you know, stuff went bad, get revert back to the state, the version, the state that actually works. It's the same with progressive delivery because at some point if that doesn't work, the progressive delivery tool would then scale back everything and then report back that this didn't work. So at that point you have something that's up and running, but you have a progressive delivery state, so to speak, on top of that, which says, you know, we tried to do that, didn't work. And then, yes, I would say get revert back to whatever works, you know, gather your data, figure out what went wrong, and go from there. Cool. I think, Andy, because I was thinking a similar question, because earlier, Robert, you said

Starting point is 00:16:46 something about not having any gates to the rollout, but obviously you're still going to be doing checkpoints in production. But I think, number one, that should all be automated in order for this to be proper. And number two, as you said, all of your hardcore testing should be done before you get to that phase. So the only thing you're going to discover is the unknown of production. And as you're monitoring that, looking for that, then you can make

Starting point is 00:17:11 those decisions. But again, I think where you're going at is no manual gates. It should all be automated. So as you do that progressive rollout, if something goes wrong, you could still go back. It's not like, oh, you have to push it and live with it. So for and Argo has this as well, but I'd have to kind of do it from the folks perspective just because that's what I'm used to.

Starting point is 00:17:33 You have the you have the image reflector functionality where you could say that, you know, watch this OSI repository. If there's a new version of the container, send a get commit, update it with a new version. So you kind of get always the newest version or based on some sort of policy or regex thing. You could have that, for instance. That is fine, but you probably wouldn't have that in production, right, if you have a big production system,

Starting point is 00:18:07 because it's basically like a better way of using latest. You can't really use latest, because what does latest mean is when you check, it's the latest version. But it's kind of like a good version of that. But you would have that in development. You wouldn't have that in production. So you kind of have to set up your environment based on what makes the most sense. For development scenarios, I usually tend to set up with image reflectors. So every

Starting point is 00:18:40 time we'd push out something new, it's just going to be there and it's willing to fail or work. Who knows? And then you would have a proper staging environment depending on how important the application is. And if you have, like I said, flag or progressive delivery on top, you kind of have that end of the line. Make sure that the service is up and running functionality just to make just to make sure that you don't push to all your users, at least a bad version of the service. By the way, folks, if you're listening in and want to follow up on some of those links, I'm collecting as we speak some of the links. I found the one for the reflector. I also found the blog post that you mentioned earlier from Alex. I will add this. A question for you, because this is a topic that comes up a lot. People like the kind of in production when load increases

Starting point is 00:19:40 and then the automated scaling or automated upscaling, downscaling. How does this fit into the real, quote-unquote, best practice implementation of GitOps? Would it be that you need to specify in a pull request the new replica sets or is it still okay to use something like, I don't know, KEDA or HBA where they take care of scaling and then you say the desired state is still met because in desired state you say the range of how you want to scale? You can definitely do that. And there's always the possibility that GitOps doesn't work in your scenario, right? You always have the tend to not necessarily have any kind of again, depends on the deployment, depends on the scenario, etc.

Starting point is 00:20:52 But if possible, I like to have a specific number of of of deployment or parts or replicas. And and I have the kind of like bursting, so to speak, of those tools to implement. The thing about GitOps is that it shouldn't change something that it weren't responsible for. So if you have something like a horizontal pod autoscaler on top, that should work as intended while the kind of source might differ. And again, implementation details. So I would have double checked that. But in most of the cases where I set up get ops is usually not that necessary to have that

Starting point is 00:21:48 auto scaling set up in that fashion. However, I would, worst case scenario, you would auto scale when there's a lot of traffic and if they continue as reconciliation would then kind of pull that down again. And you would see that it auto-scales straight up again. So it's probably just set the desired state to something higher, right? And the normal reconciliation loop timer for at least Flux is, I think, 10 minutes by default. You could set that to something much higher if you have a lot of automation around there to make sure that the traffic gets served at least.

Starting point is 00:22:34 Yeah, because the big debate that at least I always hear is how do we apply a change to the production system when we are reacting to problematic situations? Do we just automatically execute a kubectl and change the replica sets, or will this then be conflicting with our GitOps approach because GitOps will then scale it down again, which makes a lot of sense because GitOps, as we explained earlier, in the reconciliation loop is trying to

Starting point is 00:23:09 always apply the desired state with the actual state. And so, I mean, I love the definition what you had earlier, right? Because if you're doing GitOps right, everything, your whole state, is declared in Git. And you should always keep it up to date

Starting point is 00:23:26 because then every change is version controlled. You can always go back, you have a full audit trail of everything that you've done, which is great. In reality, I see a lot of people are still then saying, okay, but if we need to scale up, do we then just on the fly change things? No, we need to actually to do it right go in open up a pull request if somebody or some automation even

Starting point is 00:23:50 obviously confirm their change and then let Lux or whatever to use then yeah it's it is it is one of the things that I talk about when I talk about GitOps is that if you're trying to give teams the opportunity to be autonomous and control their own deployments, they should own the repository that you're doing GitOps to their system. I tend to not have pull requests for deployments at all. We usually the teams that I work in write directly to main. Mm hmm. And and the reason for that is, again, because all this is just the deployment phase of it.

Starting point is 00:24:49 Unless there's something very specific happening, there's not a lot of eyes that need to be involved to change numbers of replicas and stuff like that. Or maybe if there's updating of versions on particular containers or something like that, you would do a pull request, but you can do that even though you can write to main. So there's a lot of people that are doing get-ups, they work directly in main with no pull requests. And at that point, what's the difference between running kubectl, scale deployments, or just updating it in Git? The thing about if you're doing it in Git, you would have the audit trail and the visibility that that happened. It's not people sitting on

Starting point is 00:25:40 their own machines just doing random commands. And that would also mean, it's also part of the thing about GitOps, it's easier to secure those deployments because you don't have to have people with direct access to clusters. People can change this through Git. And it's a lot more easier to set up GitOps having a cluster reach out to Git, getting the changes and pulling them in, then having a pipeline and then giving access to the cluster from the pipeline. If you're doing, for instance, again, Azure DevOps pipelines, try whitelisting Azure DevOps pipelines IP addresses. It's impossible. You're right, yeah.

Starting point is 00:26:34 They change all the time and there's hundreds of them. But you can't really do that. But what you can do is you can block all access to the Kubernetes API from the public internet and then the cluster can then reach out to Azure DevOps and get from the Git repositories there to changes. So there's a security aspect here as well that simplifies things a lot. Yeah, yeah. Sorry, go on.

Starting point is 00:27:00 I was just going to... It's you, Brian, you can do it. I'm sorry, go on. It's you, Brian. You can do it. I was going to say, going on your idea of the scaling and the right to main, just thinking about obviously our own tool set, it sounds like something we could do then would be is if you're monitoring those situations and you see that you need more scalability, you could then use the automation workflow or something right to main in your Git repository to scale

Starting point is 00:27:31 and automate that cycle and then under certain thresholds push again to bring it back down. And I guess, as Robert was saying, if you're doing it as a pull request, then you have to go through the whole cycle of approvals and everything. So I think the key there is then defining rules of what you can do directly in a main and what you can't, and then enforcing that somehow. That's going to be the people challenge. But at least in practice, it makes a lot of sense.

Starting point is 00:28:01 Yeah, and if you, again, platform engineering is kind of also one of the topics that I talk a lot about. The idea behind platform engineering is to make it so that teams are able to control their own deployments. If you have a team that has a service, they are creating that service, they're developing it, they should also be service, they're developing it. They should also be able to do the deployments. And if they would just basically get a Git repository with the deployments of their service and no access to anything else in the cluster, because obviously they can't

Starting point is 00:28:40 connect to the cluster, they're not getting access to the Git repositories where all the underlying infrastructure, like the server managers and the secret management stuff and all those kind of things. That's not what they're able to get. They just get the deployments themselves. Then they should be able to just do whatever they want because they're responsible for their service. What I wanted to say earlier is thank you so much for going into that much detail. I know sometimes I'm just really interested and then I want to just dig a little deeper and thank you for being open to that discussion. Now we just opened up another idea

Starting point is 00:29:19 or question, but I think you already answered it, but still from a confirmation perspective. So you are obviously saying clear separation, best practice between your repositories for your application services that your developers are building. And then I guess the underlying platform itself. And then is there another layer where you say platform services? You mentioned, I don't know, maybe database services, service management. In a typical setup that you've seen, how is GitOps structured in terms of who controls and owns which type of repository that is defining what type of the desired state? So I'm going to go to my default answer, which is it depends.

Starting point is 00:30:08 Yeah, OK. But it really does, because if you if you are in a startup with five technical people and you're doing everything, you could have one monorepository for all the every single cluster and all the services, right? Because you are just five people doing that. If you are a big enterprise, if you are maybe someone with some really weird requirements because you're a telco or your whatever health, you know, governmental health enterprise stuff, you probably can't just do that.

Starting point is 00:30:52 I would obviously promote inner source as much as possible. So even though people can't actually write to stuff, they should be able to see it because it just helps people to figure out what's going on. But if you're in that scenario, giving teams a repository for their application and then obviously setting up all the automation around that so you get to a set table, so to speak. There's a Norwegian term that I kind of translated, that didn't work. But you get to a point where everything's just ready. You get your repository,

Starting point is 00:31:32 you just put your manifests or whatever you need in there, and then the automation is done. That is a good scenario for developer teams to be in. Then obviously you have some scenarios where you're at a company where the developers really don't want to touch Kubernetes stuff at all. That happens as well. And then you kind of have to have, maybe you need something like Backstage to have the kind of golden path scenario. I would prefer if you did that, then that would actually get written to a Git repository when

Starting point is 00:32:14 you're done. So there's a lot of different ways of setting that up, which is why again, kind of the default answer, it depends. so i don't really believe in best practices uh there's there's good there's patterns and anti-patterns and some patterns might work for some companies and it might be the worst pattern ever for another company i mean i believe though too what do you one of the things I can take away from what you just said is don't over-engineer things, especially for a small company that might not be in need. But also take into account all of the requirements your organization has, because there are certain things you have to do maybe, and then figure out the best way that works for you.

Starting point is 00:33:01 The fact is that you can, as you said, you can do GitOps from one big monorepo with everything in there. But depending on who you are, you may want to separate infrastructure. You may have to separate infrastructure from the individual platform services, from the individual apps.

Starting point is 00:33:20 And if you have, for instance, a shared Kafka service, that might live in one Git repository just because, and especially if you have a lot of teams, the customers that I've been at that are big enterprises that have those kind of shared services, the number of teams that need to be onboarded into, for instance, like Kafka, the topics that they need to have created, or for instance, if they have HashiCorp Vault or something like that, the number of teams and namespaces and services, et cetera, et cetera, et cetera. If you have everything like that in a monorepository at that point, it first of all, it gets hard to control because get is not the easiest thing

Starting point is 00:34:09 to secure, like like finally grain secure. You can you usually can do stuff on a more repository level. And then you obviously have the code owners where you can kind of, you know, hit some part of the the folder director that you're in. But you can't set a lot of specifics. So it just gets easier having several Git repositories at that point and giving people access based on that. Thank you so much. You also mentioned in the beginning, or at least when we talked about this briefly, the OpenGitOps project.

Starting point is 00:34:51 We will link to it as well in the description. I know that Christian Hernandez, who we had on the previous podcast, he also mentioned it briefly. Just maybe a couple of words from your side, OpenGitOps, why should people look into it? What problem is it going to solve or trying to solve? So the open GitOps came out of the GitOps working group.

Starting point is 00:35:16 It's kind of like the, let's call it the governmental CNCF GitOps project, if you want to call it that, which makes no sense, but it's where kind of the GitOps principles live. There are working groups underneath the project for events. So there's a lot of events happening around GitOps. There's the ArgoCon, there's GitOpsCon, there's a lot of cons, so there are several of them. And one of the groups that it's been a little bit inactive lately,

Starting point is 00:36:00 I think, well, personally, I had a lot of things to do this last past year. And a lot of people that now used to work at WeWorks, I think, had a lot of things to deal with. So it's been kind of a little bit slow. But one of the things that we were trying to do or we will probably get done this year is to have a, I think it was nicknamed like fact checking group, you know, there's a lot of big companies, you know, well-known organizations that write articles about GitOps and it's based off the opinions of whoever wrote it. And they do the usual thing of just, they've heard GitOps and then they kind of just, obviously I know what that is.

Starting point is 00:36:50 And then as soon as you start reading it, they're talking about pipelines and how to get that up and running. So we have a lot of, we have a group that wants to create stuff around that. There's obviously been talk about white papers, but we haven't gotten around to that yet. But several have been involved in writing questions for the GitOps certification that's going to be coming out really soon. I think we're having a meeting about kind of finishing off all the, everything before release this week or next week or something like that.

Starting point is 00:37:35 So that, you know, Linux Foundation or CNCF, obviously, certification for GitOps. And that's where those things kind of come into play. Like you say you know GitOps, well you take the certification. If you start answering, well you just point the pipeline and set the trigger to, yeah then you're gonna fail. Yeah, so we should advocate this podcast to everyone that wants to get the certification because obviously a lot of a lot of uh good pointers to what githubs is and also what githubs is not i think that's

Starting point is 00:38:11 very important so uh github open githubs.dev yeah that's where people can yes and that's also where you uh where people will find information about the certification i would assume yeah there's gonna be a blog post about that. We do blog posts from time to time. I'm going to write some really soon, hopefully, again, time permitting. There's the event page and the principles are there, or the version 1.0.0 version of the, obviously we put the principles in Git and versioned them. So you will know if there's going to be an update later on. You go back and check the desired state. Hey, I told you in the beginning that I will see you in Stockholm.

Starting point is 00:38:58 And I want to make sure I don't make a fool out of myself when I get on stage and talk about platform engineering. Yes. fool out of myself when I get on stage and talk about platform engineering. So yes, yes. What is there that I need to know from you about platform engineering that maybe also other people get wrong? There's a lot of things people get wrong. Not that I have all the answers. I just, you know, again, I kind of I kind of care people get wrong. Not that I have all the answers. I just, you know, again, I kind of care about these topics. So I kind of deep dive into them. And there's a lot of,

Starting point is 00:39:34 when I talk about platform engineering, I usually take up the fact that, you know, platform engineering is the modern IT operations. It's not something that kind of pushes out DevOps, which a lot of people have been promoting on various social medias that DevOps doesn't exist anymore because there's platform engineering. I was on that panel, DevOps is dead, and I was fighting for DevOps is not dead, but I think it was a strange thing that we did here.

Starting point is 00:40:07 I have isDevOpsDead.com, I think, registered. And with my awesome web development skills, HTML, I just wrote no. So, you know, that's my opinion on that. But yeah, so that's one of the things that I usually talk about that it's again, it's the fact that people don't care to actually, you know, look into what terms mean. So when DevOps became a thing, people just took, you know, basic cloud engineer and say, all right, you're DevOps now.

Starting point is 00:40:47 So you have DevOps engineer, which doesn't make sense. And then people then obviously, that's why DevOps came in, because people should stop being DevOps engineers. They should be platform engineers. And that kind of makes sense. But it doesn't make sense that DevOps engineer is a thing. But the most important thing about platform engineering for me is that it is the modern way of doing IT operations. You should treat your platform as a product.

Starting point is 00:41:17 Your developers that are using this platform or external or internal clients, as we call them, they are the people that you should cater to. If people want to have certain functionality, don't just dismiss them and say, you're stupid, that's not a thing. They're your clients. You need to figure out how to make your clients happy. That's the entire idea of platform engineering.

Starting point is 00:41:42 That's why we're creating these platforms. But that doesn't mean that you have to do, you know, have a backstage deployment. It doesn't mean that you have to create your own API. It doesn't mean to, you know, there are different ways of fulfilling kind of that mindset. And I'm one of the co-chairs of the platforms working group under Tag App Delivery. And we created the platform as white paper, which kind of describes what a platform is, which is a set of capabilities. You don't necessarily have to have every single capability

Starting point is 00:42:20 at first glance. That's what's referred to as the thinnest viable platform, you just need to get something up that kind of is sufficient enough for people to get their job done, and then you iterate on that. But the platform is quite able to go a lot into what the platform is. And then we have the platform maturity model which then kind of deep dive a little bit more into what basically platform engineering is and all the aspects that is involved in defining platform engineering and you know how mature one is in certain aspects. For instance, you could have a very ad hoc platform engineering team, people just wanting to create a platform and making it easy for people to do development.

Starting point is 00:43:15 It's probably not going to get very far because you kind of have to work against the top-level people. You need top-level support. It's not bottom-up or top-down. It's kind of like the top level people. You need top level support. It's not bottom up or top down. It's kind of like both ways. That's the only thing that actually works. A bottom up approach is fine, but you're going to get hammered on top.

Starting point is 00:43:37 And the same way down, if you're trying to push down changes, people are going to revolt. It's not great. So everyone needs to be involved. And there are so many ways to solve all these issues. But platform maturity model is a good one to look into to see kind of what you can tackle next.

Starting point is 00:44:00 Yeah. I took a lot of notes and also thanks for the reminder that since you have white paper and the maturity model, I've unfortunately only been in your working group once or twice because the time didn't work out. A good reminder for me to get more actively involved. I've promoted the white paper in some of my presentations and definitely will be at my talk. For me, what you said is spot on when you said, and this is the way I explained it in one of my recent talks when I was asked, I said,

Starting point is 00:44:35 the most important thing is that you need to understand the pain point of your users, because if you don't know what your users really need and you just start building something based on what you think they need, then you are bound to fail because you're not solving a problem that is for real. So I think that's why I like your explanation

Starting point is 00:44:53 with it has to have top-down support, but it needs to have bottom-up requirements. This is why. And also I think, and I wanted to get your take on it, if you're doing platform engineering, I also don't think you have to have 100% solutions to the problems that the developers have

Starting point is 00:45:14 because I think you should focus on the classical 80-20 rule, figure out where the biggest pain is and where you can have the biggest impact by providing something something like you said earlier with a backstage, with a simple templating engine, right? Where you can take away the pain of dealing with things that developers shouldn't deal with. They should just focus on their application definition.

Starting point is 00:45:39 Yeah. I prefer to create an API for my platforms. That's what I prefer to do. But that takes a lot of work. It takes a lot of finding all the right tools and making it all work together. In the meantime, a wiki page that explains how to do the deployment to your environment is fine.

Starting point is 00:46:02 It means people can do stuff. And obviously, you're then going to talk to your developers or your clients or whatever you want to call them. You know, what about this is painful for you? Like, what can we do to make this better? And it's also a little bit like, it's a little bit from kind of the old school of the, what was it, Ford had said, if, you know, you can get any color you want, as long as it's black type thing. If you have someone,

Starting point is 00:46:40 you know, if someone needs a NoSQL database, do they necessarily need to have, for instance, MongoDB? So if your developer comes and says, I want MongoDB, do they need that specifically? Or could you solve that with something that's more easy for you to support underneath? So for instance, and you can do it kind of like a little bit smoothly because if you work at a place that is Azure specific and you have

Starting point is 00:47:15 developers and they want to use MongoDB, you don't necessarily have to get, what's it called, MongoDB, the Atlas service or set up in Kubernetes, MongoDB or something like that, and have then something outside of the scope of your platform team and they need to learn something new and so on and so forth. What they can do is use, for instance, Cosmos DB with the MongoDB API. And from the developer's perspective, they're just connecting using the MongoDB drivers. And they would get basically the same functionality. There might be slight differences depending on, but in most cases, it's probably going to solve it. And at that point,

Starting point is 00:47:57 it's more about the capabilities. They need NoSQL. They might have some preference like how they connect to it, but they need NoSQL or unstructured databases. And you can solve that without having to cater to if someone says, this specific technology, that's what we need. You could still go, no, we're not gonna set that up, but what we can do is solve it by adopting this technology that's easier for us to. So it's a little give and take, but at least you're not in a typical scenario where you have a database admin and you go and ask for something and they go, well, we're only doing Oracle databases. If you're trying to get anything else, we're just going to say no. In the end, it comes really back to what you said in the beginning where you said, it's

Starting point is 00:48:55 really a platform needs a product mindset. And with any product development, like Brian, if I look at our product, a lot of our customers are asking us, I need this particular chart feature and I need this particular color for my chart. And then I think the challenge then of a product manager is to say, what are the big pain points and how can we solve the real problem that they have? Because a different shade of red or green is not their problem. I think they want to solve something differently.

Starting point is 00:49:23 And then you figure out how you can solve it for not only a single person but for the majority of people that have this requirement. And it's the same thing here what you just said with... Solve the problem not the request. Yeah, exactly. Yeah. And we actually have, we started work on a platform as a product white paper that goes specifically into those kind of things. So if people want to contribute to something in the CNCF, that is an active thing that's going on right now. If you're going to be in Paris, they're going to do questionnaires there, interview people about that to get a research for that kind

Starting point is 00:50:11 of white paper. Perfect. And that's also part of the app delivery, the Tag App Delivery is driving this? Yes. Yeah. So that's us in the Platforms Working Group under Tag App Delivery. So folks, if you're listening in, then you will find all of these links to the Tag App Delivery, the Platform Working Group, and also the white papers we discussed in the summary of this podcast. Nice. Brian, do you have any further comments or questions? Yes, I have two scenarios I would like to see play out in Stockholm. I would like either to see Andy try to define GitOps and Robert stand up and yell at him and tell him he's got it wrong. And that he's just spoiling it for everybody.

Starting point is 00:51:07 Or if that doesn't happen, I'd like to see Andy surprise Robert by inviting him on stage to say, hey, why don't you come to find GitHub? That's not really a surprise if you ask right now. Well, I'm saying this is the scenario I want. That either will happen, but yes. Seriously, though, Robert, this has all been fascinating.

Starting point is 00:51:28 A lot of this, and I'm looking forward to the white paper as well on defining GitOps because even as you're describing, I'm a lot slower than Andy. I'm starting to wrap my head around it a little bit, but I'm going to need more head around it a little bit, but I'm going to need more on that at some point to really find the difference there. I'm looking forward to all that,

Starting point is 00:51:56 and hopefully we don't end up with Git DevTest Sec BizOps. Yeah, if someone wants to start a Git SecOps, DevTestSecBizOps. Yeah. If someone wants to start the GitSecOps movements, no, we don't do that. We don't need it. You know what's interesting? I mean, there's the whole thing

Starting point is 00:52:18 with better definitions of terms and it's not just the GitOps and DevOps and DevSecOps. Today, I had a discussion with somebody and the term observability as a service, that's because we are living in the observability space, right? And I have a completely different definition of what observability as a service means

Starting point is 00:52:41 versus the person that I talk to because we came from a completely different background. I was, for me, observability as a service means versus the person that I talk to because we came from a completely different background. For me, observability as a service in the context of platform engineering is to make observability a platform feature that as a developer, if I'm using Backstage and I check the box, I want my logs, my metrics, my traces, I get it and I don't need to figure out

Starting point is 00:53:03 is it open telemetry to Grafana, is it data trace, is it data dog. And that other person had a completely different thing. They thought about observability as a service, as an outsourced service that somebody else is taking over observability for an enterprise. And to your point, the definition of terminology is so important because we have

Starting point is 00:53:29 completely different understandings depending on who we are, where we come from, what we do in our daily life. And it's all about abstraction layers. It's the, you know, talking to people about Azure Kubernetes service, saying it's a managed Kubernetes service. And then they go like, but, you know, it's just the servers and cluster operations that's managed.

Starting point is 00:53:56 I have to still do Kubernetes. It's like, well, yes, you know, that is a managed Kubernetes service. Now you just have to use the Kubernetes. If you don't want to do Kubernetes at all, you need to go another abstraction layer up to something else. And that is kind of like the thinking, again, people hear something like managed Kubernetes, it's like, oh yeah, the entire thing, everything about Kubernetes is managed. So not really.

Starting point is 00:54:27 It's the same with GitOps and DevOps and all those things. They just hear kind of the definition and kind of go, yeah, this is what it logically means to me. So it must be. I wish more people, you know, when they hear stuff that I haven't heard about before, took a little bit of detail. I don't need to spend a lot of time, but spend five minutes looking into it. It would be great. You could say that about every situation. Yeah, so I don't have to explain stuff over and over again.

Starting point is 00:54:59 Well, now you can just point them to this podcast recording and say, folks, if we haven't yet figured out what GitOps is, listen to this podcast. And then you'll pass the test if you listen to it. Get certification. Alright. Well, Robert, thank you so very much. Really appreciate you being on today.

Starting point is 00:55:17 And I hope everybody learned quite a lot from this episode. I know I did. I hope too. It's always great to have very very smart people on so appreciate it I hope you start feeling better and thank you and yeah thanks again for being on really appreciate it thanks for having me

PurePerformance - Why GitOps is not Git plus Automation for Ops with Roberth Strand

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.