PurePerformance - Why Developer Observability is not a tooling problem with Viktor Farcic

Starting point is 00:00:00 It's time for Pure Performance! Get your stopwatches ready, it's time for Pure Performance with Andy Grabner and Brian Wilson. Welcome everyone to another episode of Pure Performance. As always, I'm not Brian Wilson actually, I'm just Andy Grabner. Because Brian is no longer on his vacation, but hopefully he's fast asleep, because at the time of the recording, it's in the middle of the night for him in Colorado. So I'm doing this session solo, but I found an awesome guest that I have wanted to have on the show for many, many years. Viktor, Viktor Farchich, welcome to Pure Performance.

Starting point is 00:00:53 Thank you for having me. Yeah, I thank you for taking the time out of your busy day. Viktor, I know it's really hard to not see content of you when you look on YouTube and you search anything around DevOps, around Kubernetes, infrastructure, cross-plane is a big topic of yours, obviously. But can you quickly remind the listeners that may have not seen any of the content or have never heard about you, who you are, kind of a little bit of a background and what you do in your day-to-day life? So I'm officially a developer advocate at Upbound. Upbound is a company behind Crossplane. And on top of,

Starting point is 00:01:39 I mean, I said intentionally, officially, because I tend to change what I do and what I'm interested in. You know, like a kid that requires a new toy. Whomever has a five-year-old knows this kind of, you give a toy and that toy is boring after an hour or something like that. That's me, right? I like playing with stuff. And I like trying to figure out how those things... And just to be clear, I pick mostly randomly stuff or tools or platforms, right?

Starting point is 00:02:11 No criteria at all. I just want to know what it does. And I just want to know how I can plug it into something bigger into the system, right? And then those videos and many other things that I do, and everything I do is public, end up in some form or another, right? Lately, mostly YouTube videos. Yeah. So, folks, if you listen to this and you want to follow up with Victor,

Starting point is 00:02:38 either connect with him on LinkedIn or watch the YouTube video, all the links are in the description of the podcast as always. Victor, I've seen many of your presentations at different events recently, the KCDs or Cloud Native Days. The last one that I remember vividly was I think the best failing demo ever. This was in Zurich

Starting point is 00:03:01 when you and Whitney were on stage and you were supposed to give a talk kind of choose your own adventure and all of your demos failed but I think it was still one of the best created sessions because just the way you delivered it yeah from the demo perspective

Starting point is 00:03:18 it was a disaster it's not that one thing failed but a number of things failed it was casc one thing failed, but a number of things failed. It was cascading effect. Nothing worked, but we recuperated from it and just turned it around and did it without. People liked it, so it was fine. Yeah, and I really liked the, I think in general,

Starting point is 00:03:39 the talk was interesting because you, as you said, you allowed people to pick one tool out of a category of tools before building, let's say, your platform on Kubernetes, whether it was around security, whether it was around infrastructure as code, whether it was around GitOps.

Starting point is 00:03:56 And then because you looked into all these different tools, as you said, and then you were supposed to do kind of like an end-to-end demo, but you could just pick and choose whatever tool the audience wanted. i um yeah nevertheless yeah i had conversations with people before and still do and very often they try to convince me that hey you should you should record your demos right you should just play it on a button or do some there are some other tools

Starting point is 00:04:25 that you can just press space and it looks like it's typing and I was always I'm going to do a live demo and now in spite of what you're saying I'm going to complicate my life even more that I'm going to let people choose the direction so there will be 17 demos

Starting point is 00:04:41 randomly chosen in a talk while I'm saying this I realize that I'm probably some kind of 17 demos randomly chosen in a talk. While I'm saying this, I realize that I'm probably some kind of masochist or something like that. I mean, it's... Whatever you want to call it, I still think keep doing what you're doing because it's educational,

Starting point is 00:04:58 it's entertaining, so it's edutaining, as I like to call it. Victor, there's two topics that I would like to discuss with you today. Also, that you suggested, like one was around developer observability or observability for developers. Because I know you told me that, you know, observability is not your, let's say, focus topic. But I'm pretty sure you have an opinion or at least you see observability for developers from your lens.

Starting point is 00:05:29 So that's topic number one. And topic number two, you said you are the death rail for Upbound. You are promoting tools like Crossplane. So I see more and more of the people that I interact with also looking into tools like Crossplane for cloud native orchestration,

Starting point is 00:05:46 for infrastructure as code. And so I would touch base on these two topics with you today. Sounds good. Let's kick it off with developer observability. When I say developer observability or observability for developers, what does this mean for you? What are the key things that you see people are getting wrong or they need to focus on? Let me backtrack for a second. I feel that we have two different types of needs

Starting point is 00:06:15 in the software industry, right? One is that you need tools and processes and what's not that are designed to cater the needs of you as a professional in some field, right? Like, if you're a security professional, you have tools that are specialized in security. If you're a person, if you're actually in charge of production, you have specific tools to observe what's going on, get alerted and so on and so forth.

Starting point is 00:06:46 And that continues, right? You can say that for every area of software industry, right? We have specialized tools for specialized tasks. But then we often have to perform tasks that are not necessarily our specialty, but we need to do it. Like, for example, if you're an Node.js developer, you are working on frontend application, you still need to, let's say, make it secure. Right? And you still, hopefully, you should know how it performs, right? You should observe it, whether that's in production or somewhere else. But you should dive into observability, but that's not your main specialty.

Starting point is 00:07:30 You do not know even a fraction of what people specialized in it know. That's how I think of those things in general, and that includes observability. And from that perspective, I don't think that we can give the same tools or same processes or same information to both types of people, right? Whatever is good for, let's say, somebody in charge of managing production observing things, whatever is good for that person is going to overwhelm the other. Because you see 57,000 different metrics and you say, I have no idea what this is. I don't know what's happening. And the other way around. If a company makes a tool that is catering observability for not just developers, then those who really take it

Starting point is 00:08:26 seriously or dedicated to it are going to say this is not enough. So you either do not satisfy one group or you need different types of tools. It doesn't have to be necessarily literally different type of tool, but it needs to look and feel different. What's behind the scenes, I don't think that's the subject here. And I think that's very, very hard, and I think that we are not even close to that. Because, hey, let me give you a simple example. We want to give developers access to logs, right?

Starting point is 00:09:09 But we don't have means to give them logs that they need. We can give them everything that application or system spits out, right? And that's very confusing. And the same thing goes with metrics and traces

Starting point is 00:09:23 and so on and so forth. I don't think it's a, just to be clear, I don't think it's a technology problem. I think it's a problem that those specific experts need to work harder to generate that as a service, to generate something specific for others instead of giving them their own tools. Like, you know, here you go. Here is Grafana or Dynatrace or whatever you're using that I'm using, you use it as well. That does not work. Yeah, interesting use case that you brought up. And also from an analogy perspective,

Starting point is 00:09:57 let me first give you my analogy because this is something that just bothers me a little bit where I have struggles. Video production, right? You produce videos, I produce videos. that just bothers me a little bit over i have struggles um video production right you put those videos i produce videos and currently we're looking into using tools like adobe premiere which is a great tool but it's very overwhelming for me for somebody that used to just click on the record button in zoom and then a video came out and that's what I shipped off. And then it just got published by one of my colleagues. Obviously, I can also use a super fancy tool like Adobe Premiere and I can make Hollywood style productions.

Starting point is 00:10:33 But it's very hard for me. And I wondered, should I, as a non-expert, deal with this complexity? Now, we will have a middle ground somewhere where I have some of my colleagues who help me come up with templates so that I can easily just upload my video into Adobe Premiere, click three buttons, and then I get a really good result without ever having to become an expert in that particular tool. And last thing on this, because I think standards is a big topic, talking about open telemetry, but here on videos, right?

Starting point is 00:11:04 When I record with whatever tool you choose to record, there's an MP4 file coming out. And that can then be repurposed by many tools to make it prettier. And I can even give it to somebody else. And I think that's also a big part of this. Of course. I mean, you can put equation between raw material, that video that you recorded, and data, right? We have no problem with those i can even argue that the problem wouldn't necessarily be to learn adult you you can use a different tool to edit videos you can also use premiere right premiere itself is not a problem

Starting point is 00:11:40 but if you take some if you take a premiere project of somebody who is into it you would see like there would be like 100 different layers on top of it right then you would get even more lost right so you can probably learn okay so in premiere i'm not using premiere myself i use final cut pro but you can probably learn in an hour that, oh, you need to cut. This is how you cut your video and two more things and you're done. Just never, ever, ever work on a project done by somebody else. Then you're in a bad shape, I feel. Coming back to the log example, because I think this is interesting for two aspects. We're currently writing a book with

Starting point is 00:12:23 two other authors on platform engineering. And one of the use cases is exactly the one that you said. I want to have a platform where I as a development team can say, I need the logs that are relevant for me

Starting point is 00:12:36 and I don't want to get everything else. And one of the ways we suggest to do this when you ingest logs, giving enough context, like who am I, who do these logs belong to? And then you can provide automation that is then pushing relevant logs to the teams, into the tools where they are. Maybe creating a ticket in the ticketing system, maybe pushing it into the IDE or anywhere else.

Starting point is 00:13:04 But you're completely right. This whole thing of we have everything, but if I give you everything, you're completely overwhelmed because this is not what you normally do. And the question is, how can we automate that expertise to make it simple for people to get the data that they really need. I can illustrate the direction where I'm thinking. I feel that I can illustrate with Crossplane, which we'll talk about later, but now very quickly. A while ago, me and a few other folks started a conversation

Starting point is 00:13:41 precisely on that issue. And the story is following, right? With Crossplane, just as with Terraform or Ansible, you can say, hey, I want to create let's say a Kubernetes cluster, and in AWS, it would create TKS, Node Group, VPC, subnets,

Starting point is 00:13:56 and so on and so forth. A single cluster would have, I don't know, like 20, 30, 40, 50 different resources because that's all, when you do it seriously, you need all those things, right? And we want developers to be able to do that and they just fill in a couple of fields,

Starting point is 00:14:15 you know, create a YAML of 10 lines. They don't deal with those thousands, tens of resources. They just say, I want the cluster, I want it in this zone, go, right? And that's great. So day one works, cool, checked. And then something goes wrong,

Starting point is 00:14:30 and they would need to figure out, okay, so first of all, is it VPC? Is it subnets that is not working? Is it this? Is it that? And I have no idea what those things are. Mass majority of people have no idea how subnets work, and that's okay, right? And then we said,

Starting point is 00:14:48 okay, but you can see the logs, and in case of cross-playing Kubernetes events, and the answer is still the same. I wouldn't use this tool to have an easy way to create a Kubernetes cluster if I would know what all those things are.

Starting point is 00:15:04 So the idea right now, and that's kind of in progress, is, okay, how about if the person designing, we call it composition, the person designing that interface can decide what goes to developers and say, okay, you know what?

Starting point is 00:15:20 If there is a problem with I'm simplifying it now. If there is a problem with I'm simplifying it now if there is a problem with VPCs just send back to the resource that created it, hey contact Joe this is a problem you cannot solve

Starting point is 00:15:35 and if the problem comes from EKS itself only show logs that contain this word or somehow filter them and show a subset and so on and so forth, right? Somehow to make a decision of what matters to the user of that something

Starting point is 00:15:56 and what is everything else. Because I need everything, just to be clear. I need everything. You don't. Now, I'm not trying to explain Crossplay itself, but I feel that since many people are

Starting point is 00:16:11 very much into building developer platforms of some form or another, that that's a subject that we are not talking enough or working enough. How can we filter raw information and present it to others yeah so basically

Starting point is 00:16:28 providing different layers of information we have raw data and out of the data we're creating different layers of information for different use cases and because we both like analogies the analogy that comes to mind is if i drive my car i get certain lights in my dashboard that indicate i need to do something tire pressure gas but if the engine light goes up red i know i need to go to the next uh station like to the next repair shop because i cannot do this anymore exactly exactly there are things that you should know about because you can change a tire maybe, right? But you cannot fix the engine. So the tire would be you need information exactly which tire, which exact tire to change. Kind of like you need details. Engine, just that light. You're not going to do it yourself. Go somewhere. Another analogy to me is that if

Starting point is 00:17:29 you, AWS itself, let's say, right? Or any cloud provider, you will see information and logs, but it doesn't necessarily look like that, but you will not see all the information, right? You will not know what's happening with hypervisors when you create EC2 instance, right? Because that's behind the scenes. That's AWS's problem. You're not going to bring a new server to fix your problem. You're not going to see it.

Starting point is 00:17:55 They filtered it already. Yeah. Another thing on the developer observability, because for me, it's a lot about, as you mentioned earlier, developer getting the right data that they for me, it's a lot about, as you mentioned earlier, developer getting the right data that they need to understand what's happening

Starting point is 00:18:09 without overwhelming them. I briefly mentioned OpenTelemetry as a standard. I think OpenTelemetry is a standard here to stay. I just, in the previous episode that we have,

Starting point is 00:18:20 we talked with Hans-Christian Flotten. He works as a platform engineer at the Norwegian government, and they are going all in on OpenTelemetry. And the reason, and this is actually an interesting thought that he brought up, because I asked him, why do you go all in on OpenTelemetry? And he said, one of the reasons is they want to provide stability to developers over many years so that they always have the same framework of instrumenting code and they don't have to change the instrumentation every three years when let's say a

Starting point is 00:18:51 renew cycle comes up of their observability platform and so at least on the instrumentation side they can standardize on something and in the back end the platform engineering team can then decide hey you know we start with do-it-yourself, like everything open source, but maybe in the future they switch it out. But for the engineers at least, they still have the same standard. And to your point, they may use tools like Jaeger

Starting point is 00:19:17 to analyze individual traces because that's catered to their view of what they need to see. Exactly. And that's theed to their view of what they need to see. Exactly. And that's the key, standardization, because you don't want to be afraid to build something, right? Because many of the companies are blocked by, hey, is this the right choice?

Starting point is 00:19:42 Is this the wrong choice? What will happen tomorrow if I change the choice or what's not? And OpenTelemetry to me it's in a similar level of why we like Kubernetes which in my head is a similar thing. We have a standard API. It's not about containers.

Starting point is 00:19:58 It's about the API and then we can build solutions on top of it. And that same company I'm extrapolating now can tomorrow say hey, you know what, we're going to keep Jagger for ourselves, but for Joe, we're going to create a completely different interface. Doesn't matter, right? I mean, it's going to be some work, but you know that whatever is behind that, behind the scenes, is exactly the same.

Starting point is 00:20:24 From an observability perspective for developers, do you see in your case any other need for developers to get observability into obviously their code that's responsible? That makes sense. I give them my logs, give them my traces, my metrics.

Starting point is 00:20:40 Do you see that there's also a need for developers to get observability broader? Should to get observability broader? So should they get observability at least a little bit into the underlying engines where the stuff runs so they better understand what is actually happening, when they deploy, when they scale? Or do you see this more as this is not the responsibility? This is when you will tell the developer, go and call this team. I do feel that developers in ideal situations should have...

Starting point is 00:21:11 Let me backtrack, actually. I do think that everybody should be responsible for their stuff, right? So if I'm in a team that is working on application X, I should be responsible for that application from the beginning to the end. That includes production. And the reason is very simple. I don't want to throw over the wall things because then I stop caring about certain things that I should care about and I'm the only one who knows what that something is.

Starting point is 00:21:43 I'm very skeptical about saying, hey, different team is going to run this in production and be in charge of it because the different team has no idea what you're doing, right? And if you are responsible for a project, for a product end-to-end, you definitely need observability. And that means observability of that application itself, that's a no-brainer. But also, and this is now

Starting point is 00:22:12 what I'm going to say, a bit tricky, system-level data metrics of observability, but in relation to your application, right? So, I don't need to see all the clusters, but I need to see, the clusters, but I need to see, let's say, memory and CPU of the node where my application is running just in case if it fails, it could be

Starting point is 00:22:34 the fault of my application. It could be because there is not enough memory on that server, right? So basically, I'm advocating for... That falls into the same category. I want filtered information of what people in charge of the whole system see. Just in relation to my application, filtered, modified, changed in a way that I feel very, very comfortable with it, even though I'm not really an expert in it.

Starting point is 00:23:04 Yeah. feel very very comfortable with it even though i'm not really an expert in it yeah yeah yeah and i think in order to to make this a reality at least one of the things that i see is this whole propagation of of of ownership or like context propagation like you know this is this is this component it belongs to this app and if we can then propagate this ownership or i think like keep it on let's call it ownership for the sake of simplicity, this ownership information up and down the stack, then I think it's also easier to then provide exactly a filtered view on the data that is relevant for that particular component, for that particular team.

Starting point is 00:23:41 I feel that what would be awesome without entering into specific technologies is if simply that would be part of an application, right? Just as build scripts are part of an application, you're most likely going to have maven and makefile, whatever you're having inside of the repo of the application, right? And you're going to have pipelines as well, or workflows, inside of application because that belongs to the application. What I don't see very often

Starting point is 00:24:14 inside of application, inside of the repo with the code, is the information about observability, right? I feel that there should be some kind of dashboard defined by me. This is the things that,

Starting point is 00:24:30 or by me together with somebody who really understands it, that's fine. And kind of every application creates its own dashboard with its own things that matter to that team. I mean, this is interesting because we've been promoting this for a while, at least

Starting point is 00:24:45 with the people and companies we interact with. We call it observability as code that runs alongside with your code as code. So that means you as a developer can, through code, configure what metrics, what logs, and what type of traces are actually relevant for you. You can, but you don't necessarily have to specify a dashboard because if you say these are the five metrics that are relevant and I'm expecting these metrics to be in a certain range, you can then translate this into the configuration

Starting point is 00:25:15 of your observability tool to get automatic alerting or put them on a dashboard if you really need to drill down into an error. That's also why I want to switch over to cross-plane soon, but there's one more thing. But just before, to finish that thought, this is also why we are building from a Dynatrace perspective and I'm pretty sure Datadog, Nirelik, all the others, they're also thinking about how can I use cross-plane to also configure my observability? How can I configure my ingesting rules for logs? How can I configure my alerts, my SLOs, and my dashboards

Starting point is 00:25:51 through tools like cross-plane? Oh, there are many, many, many answers to that question, man. Let me start by saying that one of the things that personally, to me, attracted me most to Crossplane, and I'm talking now before I joined Upbound, is that it is, let's call it Kubernetes native, meaning that a short answer to your question can be anything you want that works with other stuff in Kubernetes, right?

Starting point is 00:26:22 Hey, whatever is the way you're collecting logs from your applications running in Kubernetes or whatever else you have in Kubernetes, you can use it with Crossplane because it works exactly the same way. Events and so on and so forth, right? Statuses, everything, you do not necessarily need to change

Starting point is 00:26:39 how you collect information from Crossplane than anything else. And it's very unlikely that Crossplane will ever build anything in that direction simply because why would it, right? That's the part we like about Kubernetes in general. I can focus on this and I know that since we've all followed the same standard, it can work with everything else in the landscape.

Starting point is 00:27:08 And then there are certain metrics that, again, open telemetry, just as you mentioned before, Crossplane now exposes open telemetry so that you can see. But the data that is mostly related to Crossplane itself, not necessarily useful to the end user, right? Hey, is this up and running? Crossman itself? What is the frequency of this or that? And so on and so forth, right? Where we are missing, in my opinion, is that developer story.

Starting point is 00:27:39 That's the story I mentioned before. Actually, there are a couple of them. How can we propagate information up the stack? Because the way Crossplane works is similar to, let's say, Kubernetes deployments, right? You don't create directly pods when you work with Kubernetes. You create a deployment, the deployment creates a replica set, replica sets manages pods. So there is a tree of resources flowing down. But what you see really in that example is that I created a deployment, right?

Starting point is 00:28:12 And I'm not the Kubernetes ninja. And don't give me now and this is now my complaint towards Kubernetes in general just to be clear. Nothing to do specifically to Crossplane. I created the deployment, I'm not an expert in Kubernetes, and now you're telling me that I should look into pods. What is

Starting point is 00:28:34 that? That does not work well. And what I think that Kubernetes is missing, and Crossplane, through it as well, is how we can move. Imagine all the logs and events you have seen in pods. If we could somehow propagate it up through the replica set, through deployment.

Starting point is 00:28:58 And that's, I feel, the big thing. Because most Crossband users use Crossband in some kind of a platform, right? You have usually two personas. One is I'm going to call that person service builder. I create a mechanism for you to be able to manage database, whatever the components are.

Starting point is 00:29:18 And you have a consumer. That consumer is completely lost. When things go wrong, when it works, brilliant. Just taking notes here because I think that's an interesting I remember my first experience when I deployed

Starting point is 00:29:33 it, the deployment, and then I looked into Argo and I wonder, wow that is a long tree of objects. I didn't even know what's happening behind the scenes. Like there's so many things happening just by doing a kubectl apply. How is that possible?

Starting point is 00:29:50 Exactly. And now, think of it like with crosspoint, that very often tends to be multiplied by 2, 5, 10, right? Because in that Argo CD application, I'm assuming you saw some deployment, creates replica sets, pods,

Starting point is 00:30:09 there was a servicing, there's a few other things, maybe ten. And in Crossplane, the number can easily, from one resource, come to a hundred, right? Because a lot of things

Starting point is 00:30:21 need to happen somewhere else. Yeah, coming back to your idea earlier with the different layers of information, if I'm the consumer of the template, let's call it a template of an app, and the only thing I need to do is fill out five things. Let's say the name of the app, the region where it should run, and maybe two or three other things. And that's it. I deploy it, everything works. But all of a sudden, if things don't work, region where it should run and maybe two or three other things and that's it i deployed everything

Starting point is 00:30:45 works but all of a sudden if things don't work and i need to then figure out what does not work and instead of having one instance because the only thing i created is one let's say cross-plane instance of a composite and now i see 50 different things how should i know if i'm not the creator if i'm not the architect of this template, where to look? And this comes back to your thing, having the different layers. For me, maybe as a developer on the top, it would say orange. Maybe this is something I can change because maybe I chose a wrong value for the region or a wrong sizing. I was in the wrong spec. But if it's something underneath that, let's say, AWS couldn't provision

Starting point is 00:31:27 certain things because your account ran out of, I don't know, IPs, then this is something I cannot fix. Exactly. Exactly. Maybe a good starting point, I feel, would be to think about that top-level interface. So let's say that you create some kind of interface that allows people

Starting point is 00:31:50 to create applications, right, without going into ingresses and services and gateways and what's enough, right? And now you ask yourself, let's say there are two scenarios. One is people want to be able to specify memory and CPU requests and limits, right?

Starting point is 00:32:09 And then I would say probably a good starting point is to give people metrics about memory and CPU consumption, right? If they specify that they want one gigabyte of memory, it's probably very important information for that same person how much actual memory it is using right now. But let's say scenario two, the developer says, you know what, I don't want to specify memory in CPU. I want your system to manage it somehow. It's not something I care about.

Starting point is 00:32:37 It's not something I know. That person should not see those same metrics. Because if I said I don't want to deal with that, then what benefit I get from you giving me the runtime information of the things that I don't want to know? So that's a great advice

Starting point is 00:32:55 for people that are building platforms. That means you will probably build templates for different personas and different needs. Obviously, I think building something that will cater all the needs will be tough, but you want to provide by talking with the different teams what are their needs and what do they want. And then based on that, maybe provide a checkbox

Starting point is 00:33:14 using, let's say, Backstage as an example, if that's your IDP. You may have a checkbox that says, my application is CPU-bound or memory-bound, and this is why this is very important for me. Here's some specs. And this also includes a dashboard and an alert based on this. That's basically exactly what it is.

Starting point is 00:33:35 Cool. I had a different thought earlier on observability, which I now just remember. But as we are in the cross-planeplane topic I will take this for the end crossplane for those folks that have not touched crossplane yet what do people need to know about crossplane and especially when they start using crossplane what mistakes should they not make so let me start by saying that crossplane is a, and now I'm underselling it just to be clear, but I think it's a potential solution only for people who are very comfortable with

Starting point is 00:34:13 Kubernetes, right? Because there are so many things that are simply not even explained anywhere in Crossplane. Crossplane assumes you understand Kubernetes, right? How to get, How to create things, how the tree of resources works, and so on and so forth. Now, there are two main components in Crossplane, right? One is what we call managed resources.

Starting point is 00:34:36 So let's say, and they come from providers. So let's say that you want to work with AWS, you install a provider, and when you take a look at what you want to work with AWS, you install a provider, and when you take a look at what you have in that Kubernetes cluster, like kubectl get CRDs, you will see hundreds or thousands of new resource types in your cluster that correspond with AWS. You would see EC2, VPC, just as you have deployment ingress and what's not. You extend Kubernetes API with custom resource definitions and controllers that match something that you want to manage somewhere, like AWS. That's one part. And now, that was the important

Starting point is 00:35:19 part that almost nobody uses directly, because you can indeed create EC2 VPC and stuff like that, but you won't. The second part that is more important, at least in my head for Crossplane, is what we call compositions. And that's the ability, not to extend Kubernetes by installing some provider

Starting point is 00:35:43 that comes with predefined resources, but you create your own interface and you say, okay, in my company, I would like to have something, I would like to enable people to manage databases and I'm going to treat it as a product. I'm going to go and speak with people and I'm going to ask you, what do you care about when you manage databases? And they will tell me, oh, I want to be able to choose between Postgres and MySQL. I want to be able to

Starting point is 00:36:07 specify the version, and so on and so forth. You figure out what is the product you want to build, what is the service you want to build, and then Crosspoint allows you to create a new API in Kubernetes that matches the specification, basically the schema,

Starting point is 00:36:24 and the controller that will accept resources based on that schema, let's say my database, and expand it into those managed resources. So an end user creates, let's say, my database, and then you instruct CrossPen, okay, whenever somebody creates my database, take those parameters and create VPC subnet RDS schema with Atlas and so on and so forth, right?

Starting point is 00:36:51 So it orchestrates. So imagine that you're designing equivalent of a deployment that will create replica set and pods. And you are in control what the schema of that deployment will be. Yeah. And when I saw compositions first and i'm not sure who presented it may have been you but i think it was actually somebody else who was also really big on uh i think it was salaboy i think he had it in his book yeah and uh exactly

Starting point is 00:37:20 i think he brought the example you have a composite let's say, a business application and you could deploy it in multiple instances. Let's say if you're an organization and you provide a multi-tenant deployment, but really your tenants are actually individual deployments of the same app. And you can say, I now need to deploy for customer A this app in this version in this region with this database size. And now I need to do it for customer B in this app in this version, in this region with this database size, and now I need to do it for customer B in this region, then you could create a composite that basically does exactly what you said, right? It's provisioning the resources in the target region. Then there's certain services, maybe from AWS, if you use their database services, and then whatever else needs to be done. But from a platform engineering perspective,

Starting point is 00:38:05 the only thing I need to do in order to create one of these new app instances, I just specify a simple object that says region, name, size. Boom, that's it. Exactly. And then there is the... This is going to sound strange, but I feel that the scope of Crossplane

Starting point is 00:38:27 is smaller than other tools that manage resources. I'm intentionally saying resources instead of infrastructure because in my head it's blurred. I don't know anymore what is infrastructure, what is something else. Anyways, the scope is relatively smaller than other tools. So, and, but within the project, The scope is relatively smaller than other tools. But within the project.

Starting point is 00:38:50 But then the scope of what you can do is wider than any other tool, which might sound strange, but that's going back to what I was explaining before about Kubernetes itself, right? Other tools will give you ways how to deal with logs. Crossplane doesn't. Other tools will give you ways how to do deployments of that something. Crossplane doesn't. Other tools will give you ways how to do deployments of that something, Crossplane doesn't.

Starting point is 00:39:09 But once you combine it with, let's say, CNCF ecosystem or Kubernetes ecosystem, then actually it explodes, right? Let's say you want to deploy it by pushing something to Git, aka GitOps, aka ArgoCD Flux, yeah, just do it, right?

Starting point is 00:39:31 We're talking about observability, right? Part of all you would probably among other things could be collecting with kubeScape metrics, let's say, events of everything that is happening, storing it somewhere. Yeah, go ahead, right? Almost the scope of what you can do, and this is not Corel CrossMap specific, but anything else designed for Kubernetes is basically limited to the wider Kubernetes community itself, right? Yes. So one of the use cases that we are using Crossplane internally, and I think I may have connected it with Bernd Warka on our team,

Starting point is 00:40:13 is using Crossplane to instantiate and manage Git repositories, GitHub repositories. So we have a Crossplane composite, right, that basically allows you to create these Git repositories just through a Kubernetes CD, which is very elegant. Oh, yeah. Yeah. And especially if you combine different things, right? Let's say you might want to create a Git repository and then apart from Git repository, create something

Starting point is 00:40:45 in Google Cloud, right? And apart from that, create maybe some schema with some other tool and so on and so forth. When you start combining different destinations, then it becomes really interesting. Yeah, so basically, and let me know what you think about this definition. Instead of me having to write an operator for my special database applications, I can basically use Crossplane as a declarative operator almost. In a declarative way, I can say, what do I want?

Starting point is 00:41:18 And Crossplane becomes the operator that fulfills that promise of the operator. Exactly. From a certain perspective, if you would remove all the infrastructure, resource management and stuff like that, you could define Crossplane as being yet another tool to create Kubernetes operators. Cool. One more question on the provider. So you brought the example of you could import an AWS provider. Is there a concept in cross-plane if I am cross-cloud or cross-provider? Let's say I have AWS, Google, and Azure, and I can say I need a virtual machine. And then behind the scenes, it figures it out what exact provider to use based obviously on some values and parameters?

Starting point is 00:42:08 Yeah, I mean, as long as there is... You can generate logic, yes. And that logic can be in many different forms. You can have different compositions, one for AWS, one for Google Cloud, one for Azure, one for Google Cloud, one for Azure, which is boring. More, you can have logic inside the code. I'm going to call it code, even though it can be pure YAML, that says, hey, this and then that and loop and this and what's not, right? And that's especially true since initially the whole idea about composition

Starting point is 00:42:47 was to be purely declarative. Now more and more people are keeping declarative part for the end users. So you declare the state by creating an instance or resource based on the API that we extended. But behind the scenes, what's happening in a cluster, more and more functions are in charge of it, right? And with functions, you can essentially, now I'm talking exclusively about the part happening inside the cluster,

Starting point is 00:43:20 not what you see as the end user. You can theoretically at least say, hey, I want to define this in Python, right? And in Python, I'm going to have a logic that if a person put a parameter of this or that, I'm going to take a bit from Azure, and then if this or that, then I'm going to combine it with... Basically, you're in charge of defining how does user input translate into tangible resources. And those resources can be anything that can be somehow described through Kubernetes.

Starting point is 00:43:59 Because I thought, you know, I was actually going into a different direction. I wondered if you had like a class model where there's like an instance type of, let's say, a virtual machine. And derived from it, there's an EC2 and there's like an Azure equivalent and a Google equivalent. So basically, they're all kind of like the same base class. I thought I was just wondering if that's how it works or not. Oh, yeah. Maybe what I'm going to say is, again, me misunderstanding, so correct me on that one.

Starting point is 00:44:34 But when you create what we call configuration, you have different... Like I have configuration for Kubernetes clusters, and then over there I have let's say different logic or instances that hey, if the user, especially if you want a Kubernetes cluster especially if I let's say the Kubernetes version and you put the label AWS, right? And then it will be

Starting point is 00:45:03 in AWS and exactly the same definition but you just change it to Azure or whatever the logic is, it will happen in Azure and then hybrid and then it will happen in both and things like that. But it truly depends on

Starting point is 00:45:18 you as the builder in a way, right? Crossplane does not come with opinionated, predefined anything. Okay, that was my question. If there might be like an object model or a class model already, and then if I am a vendor, if I am a provider, I can say I am a specialized instance of a cluster

Starting point is 00:45:44 or a virtual machine or a database. There are things like that in the marketplace. So you can go and find what you're just mentioning from AWS itself. I think that they have a couple of their own configurations, for example. But usually people take those things as a starting point rather than solutions. Because the goal for a majority of users of Crosspoint is not to use something built by somebody else, but to create something that is specifically tailored to whatever your needs are, right?

Starting point is 00:46:25 So usually you would take those, let me call them blueprints, right? I just invented the name, as blueprints rather than the end solutions, even though they are end solutions, right? You can just run it as is. So let's say that that would be a difference between, let's say EKS Cattle, right?

Starting point is 00:46:46 Which is great, but highly opinionated. But then if you need it to go outside of that, then you would need to be able to change the behavior itself. Cool. So, Victor, on the

Starting point is 00:47:02 crossplane topic, I think there's a lot of material out there. Obviously, folks, we will link to Crossplane. Crossplane.io, I guess, is the landing page. And from there, you'll find everything on GitHub. I assume, Victor, you also have a lot of material on your YouTube channel. Yeah, if you go to the channel which is at devops toolkit, I probably created one or two, if you go to playlist you will see crossplane or something like that. Actually there is a, this is completely free, not selling anything, there is a, I created maybe half a year ago, a tutorial, so you can just, there are like six, seven videos,

Starting point is 00:47:47 something like that. Yeah. That you can just go as if from scratch to everything, right? Perfect. Yeah, folks. So we're linking to the DevOps toolkit, which is now actually a good segue into my last question. That was the one I had before.

Starting point is 00:48:00 We started the podcast with observability for developers. How about observability for developers. How about observability for DevOps? Because obviously DevOps Toolkit, right, your big channel. And here the question is, how much observability do we need and can we have and should we have when we think about everything that happens from the first code commit until that code gets actually deployed and running in production until it gets retired. Do you have any thoughts on that, on kind of really observing the whole lifecycle of an artifact?

Starting point is 00:48:35 Yeah, I feel that that's similar or the same answer as what I gave before in terms that I strongly believe that each team should own something. Now, whether you own a front-end application or you own a tool, internal tool or something third, right? You should own something. And owning means, among other things, observability from your laptop, right? All the way until it is in production.

Starting point is 00:49:07 Everybody else can help you, not take that responsibility on themselves. I'm a bit fuzzy, to be honest, on what DevOps is. I might be the only one who struggles to understand

Starting point is 00:49:23 what DevOps engineers do. Yet you have a YouTube channel that has DevOps in its name. Oh yeah, I know, I know. I'm a tricky bastard to be honest. But whatever you do right I will generalize it more than be specific for DevOps

Starting point is 00:49:49 whatever you do own it right and own it means make sure that it's running in production that's the only thing that matters

Starting point is 00:49:56 yeah one of the things what we are trying to do what I see is really making the life cycle of an artifact observable.

Starting point is 00:50:05 So whenever an artifact initially gets created, it gets some type of unique ID. It could be the version of the container or connected with your Git hash. And then as it travels through the different stages, I think then just generating or extracting some information about, hey, it was deployed in this environment. Everything is healthy. It was promoted by this pipeline into the next environment. Now it's in production. Now it failed. So I think this is something also that I know the CDF was promoting their CD events, like using

Starting point is 00:50:40 continuous delivery events. And we also internally at Dynatrace we've kind of latched onto that standard but extended it a little bit and promoting it back because we think continuous delivery is only one part. You want to also have the whole life cycle of the artifact extended until the product is

Starting point is 00:50:58 eventually retired and that includes every operational aspect until it gets replaced or retired for whatever reason. Yeah, 100%. I mean, if that artifact you're measuring, if that's your product, let's say, right? Then yeah, you're responsible for it. And you need to see how it goes through all those phases

Starting point is 00:51:18 and so on and so forth. So, big yes. Yeah, and also like with the car analogy we had earlier, things like monitoring how often does a particular type of car have to go to the shop because let's say you know there's certain things that happen right i think these are this is actually interesting like the analogies are always great to use so hey if i'm an automaker and i see that the latest version of this particular car is coming in 50 times more often into the

Starting point is 00:51:45 shop than the previous version then there's obviously something wrong and then we need to figure out what is wrong and so it's you know in the end it's engineering what we do and engineers we need facts to make better decisions and yeah and there are different i'm going to stick with car analogy right there are different levels of what you might be producing, right? You might be making cars, and then everything you just said is 100% correct. But you might also be a tire company, right? You might be making only tires.

Starting point is 00:52:16 And you're still responsible to ensure that, hey, less than, I don't know, 0.01% of tires break within the first three years. I'm inventing the rules. I don't know, 0.01% of tires break within the first three years. I'm inventing the rules. I don't know what the rules for tires are. But you're still responsible for them, right? You're the one who is going to ensure that the quality of those

Starting point is 00:52:35 things is whatever we agreed, right? What I'm trying to say, and I think that I'm going back to your when you mentioned artifacts, right? It could be a car or it could be only a tire. I don't have what i'm trying to say and i think that i'm going back to your uh when you mentioned artifacts right it could be a car or it could be only a tire yeah i don't know what you're making but whatever you're making you're responsible for it like any application an application is not a single a single pot it consists of many different individual pieces that come from different producers potentially right and the whole thing is is the one that needs to work reliably and secure.

Starting point is 00:53:08 Cool. Hey, Victor, thank you so much for the time. As I said, we will link to your YouTube, to Crossplane. Thanks for your insights on. I really liked in the initial discussion we had around that different people have different needs to look at the data. So like if we have, from an observability perspective, all the data, then we need to

Starting point is 00:53:32 provide different layers and different views because everyone has a different need. You don't want to overwhelm people. Yes, developers are responsible for observability, but it doesn't mean they live and breathe observability like somebody does in production therefore they may need maybe need a different tool or at least a different view on that data to not overwhelm them and just make them productive. Yeah and I feel that and what I'm going to say correct me at moment, because I'm talking completely something silly.

Starting point is 00:54:08 With observability, it's much more likely that you don't need to change the tool than with other tools. And the reason I'm saying that is that observability is mostly read-only. So you're not performing actions. You're not doing much operations from observability because if you do, then it would be more likely

Starting point is 00:54:33 that you really need a different tool because the tool to perform your day-to-day is much harder to make work for all. Observability, I don't see a good reason why you wouldn't use, let's say dashboards, right? Whatever you're using. There is no need to change the technology, right? You just need a different dashboard.

Starting point is 00:54:56 You don't need to change the tool itself. But you're right in a way. On the other side, observability vendors are now speaking from a vendor's perspective. And I think, again, this is true for us, for Datadog, New Relic and everybody out there. We are not just read-only dashboard generators, but most people use observability these days, especially for the analytics piece. And every vendor tries to have a unique angle on how we analyze the data and provide value and so for just the observability piece i think you're right we can easily rip out one tool and put in another tool especially with standards like open telemetry prometheus where logs in general right it's easy to then just send the data to some other endpoint where it becomes a little

Starting point is 00:55:43 more tricky if you're using kind of like the additional things on top of an observability platform. A lot of the platforms also like ours, we now provide automation workflows that can trigger on an anomaly and you can create actions like remediation actions. There it becomes a little harder because a company makes obviously an investment

Starting point is 00:56:03 and then you would need to rebuild this. But the rest, I completely agree with you. But maybe I misunderstood. I wanted to say that for observability, there is less reasons to change, to use different tools for different… Okay. Yeah, I mean, less reasons maybe, but it's also, I think there are changes that are happening if you would only see the read-only aspect of observability. Maybe, but it's also, I think there are changes that are happening.

Starting point is 00:56:29 If you would only see the read-only aspect of observability. Because if I look at a Grafana dashboard with five metrics, or if I use a Dynatrace dashboard and I'm only using the dashboard, then I can use either two. But what I've also learned with my last podcast with Hans Christian, the big challenge is what do you do with the amount of data? You don't have the expertise necessarily in your organization to understand what data is relevant, what is not relevant, and therefore identifying anomalies.

Starting point is 00:57:01 And so this is where I think the vendors also come in and try to provide answers on top of the data, like the different levels that we talked about, right? We can give the developers exactly what they need and the SREs exactly what they need. And I think this is why people typically also stick with an observability platform for a little while longer. Just my thoughts.

Starting point is 00:57:26 Cool. Victor, thank you so much. I hope I will see, I think we actually we are both speaking, if I'm not mistaking, at the Cloud Native Days in Bergen. It's the end of October in Norway. Oh, you'll be there. Cool. Exactly. I'm a bit scared

Starting point is 00:57:42 to be honest. It's Norway in November, October. It already scared to be honest it's Norway in October it already starts to be scary but we'll see yeah you know I'm sure there is some beverages that will keep us warm okay then I'm definitely in

Starting point is 00:57:57 yes cool thank you so much as I said and yeah looking forward to more episodes of the DevOps Toolkit channel. And yeah, folks, if you ever meet Victor, give him a high five because it's great content that you create. Thanks, man.

Starting point is 00:58:18 Thank you.

PurePerformance - Why Developer Observability is not a tooling problem with Viktor Farcic

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.