PurePerformance - 056 The State of Monitoring in a Kubernetes World with Brian Gracely

Starting point is 00:00:00 It's time for Pure Performance. Get your stopwatches ready. It's time for Pure Performance with Andy Grabner and Brian Wilson. Hello, Andy Grabner. This is Brian Wilson. How are you doing today? I'm good. I survived the first snowstorm of the season in Boston. Well, good luck. I mean, you know what? I'm just going to say right now, I can't think very well. I'm not feeling good, everybody. So I'm going to make any kind of – I'm going to use that for any excuse that I can today, including my flubbed reaction to your snowstorm. You know, we here in Denver haven't really had any snow yet. People think it's really cold in Denver, but it's not.

Starting point is 00:00:56 Yeah. Well, I also wouldn't really call it a storm. Just because it snowed for a couple of hours doesn't make it a real storm. The snow is almost gone. Well, I'm sure people reacted like it was a storm they ran to their grocery to get you know the grocery store to get their their milks their milk and bread and eggs and stuff right i hope you got there was the number one topic on the news for two days right it gave them something to report on other than what they report on anyway all the time yes so that that was good so brian i mean the good news is that we have another brian today we do it. It won't be confusing at all.

Starting point is 00:01:26 No, not at all. And I would like, if you're good with it, I just go ahead and introduce our guest. I think that would be appropriate. Are you ready? Yes. Awesome. So when I'm flying around, I typically try to download podcasts and listen to things and kind of get up to speed on things. And one podcast I ran into is actually run by two guys, and we got one of them today, Brian Graceley. And Brian runs – actually, before I – instead of me explaining it, Brian, are you there with us? And maybe you want to introduce yourself and tell the world who you are?

Starting point is 00:02:02 Yeah. Hey, Andy. Hey, other Brian. We'll figure out how to differentiate that. You can just call me other Brian. There you go. So my name is Brian Grace Lee. I'm director of product strategy at Red Hat. And I assume the podcast that you were listening to is one that we just started maybe about four or five months ago.

Starting point is 00:02:22 It's called PodCTL, P-O-D-C-T-L. And it's mostly focused on this new technology called Kubernetes. Well, I guess it's been around for a couple of years, but Kubernetes and also just kind of containers in general. So the whole ecosystem around what's going on with containers and container scheduling and microservices. Great. Cool, yeah. I actually listened to that podcast. I listened to a couple of your episodes and then I thought,

Starting point is 00:02:49 you know what, it would be really cool to get you guys on our podcast because we've been, I mean, we've, we focus a lot on performance engineering on monitoring. Obviously we talk a lot about some of the new things that come, whether it is around the technology aspect of things that people build, the cloud native apps. We also talk a lot about process change when it comes to everything that happens around the DevOps movement.

Starting point is 00:03:11 And I thought it would be great to get you guys on board. And I know that your colleague, Tyler, couldn't make it. But you guys have been at a large conference last week, KubeCon, I believe, which is, I assume, the number one go-to conference if people want to learn about what's happening in the Kubernetes sphere. Is that correct? Yeah, definitely. It was – so KubeCon – so I think officially it's called like Cloud Native Con and KubeCon. So it's run by the CNCF, which is a Linux foundation group. So last year they sort of started it

Starting point is 00:03:46 and made it formal. It was like a thousand people out in Seattle. And then this, this year, a year later in Austin was like 4,500 people. So yeah, it's definitely kind of become the, um, you know, aside from, you know, the big events like, like reinvent or, or something else, it's really kind of become the big place to go talk about Kubernetes and all this container stuff. You know, you know, it's funny. I just realized that when you're saying QCon, It's really kind of become the big place to go talk about Kubernetes and all this container stuff. You know what's funny? I just realized that when you're saying QCon, you're not saying QCon, like Q, the letter Q, because there was another one. I don't know if they're still around.

Starting point is 00:04:16 Is QCon with the Q still around? It is still around. Yeah, it is still around. I think it's more Java developer focused. Yeah, I guess it's a little confusing. Yeah, awesome. Anyway. So Brian, Grace Lee now. more java developer focused yeah i guess it's a little confusing yeah awesome anyway so so so brian uh gracely now actually by the way i just looked at your twitter uh page and i assume your the picture that is on there it's on purpose that it doesn't say brian gracely but but brain gracely

Starting point is 00:04:40 oh yeah i actually went to one of our we i – I was giving a keynote at one of our events, one of the Red Hat events, but it was in Turkey. And I assume somebody must have put my name in the prompter, and the prompter probably went, oh, that's a typo. You spelled brain wrong. So it's kind of a fun joke that you go somewhere, they spell your name wrong. Yeah. So coming back to KubeCon last week, what happened last week? What are the big things that are happening in the Kubernetes sphere and the OpenShift sphere when it comes to people talking about CNCF? Any highlights that you want to – maybe?

Starting point is 00:05:20 Yeah. I mean, I think the big thing, so, you know, let's say a year ago, just to kind of put it in perspective, we were all talking about, you know, where's Kubernetes going and like, how do we make it stable? How do we make it grow? You know, how do we add new features at sort of the container scheduler level? So like, how do we do stateful applications or how do we do batch jobs? And so there was a lot of stuff going on a year ago about, you know, how do you make this container orchestrator stable and support more applications and so forth? And there still was a lot of debate in the industry of, you know, is Kubernetes really kind of the best technology? You know, Docker had their own version of a technology called Swarm. They still do. There was some technology called Mesos that had spun out of Twitter. And so there was still kind of a lot of debate in the industry a year ago of, you know, will one of these sort of standards or implementations kind of win? This year, that discussion's kind of gone away. Pretty much every major vendor is now supporting Kubernetes. Every major cloud

Starting point is 00:06:25 provider is now supporting Kubernetes. And so the discussion really kind of shifted from, you know, what can Kubernetes do at the lower level container scheduler to really a lot of discussions about, you know, how do we make it easier to get applications onto Kubernetes? Are there frameworks now that are going to be more kind of Kubernetes native to help developers, you know, understand these constructs of availability and scale out and stuff. And so there was a lot of discussions about service meshes, a lot of discussions about kind of new developer frameworks to make it easier. So definitely a shift in the discussion from lower level container stuff to much more kind of developer productivity,

Starting point is 00:07:05 developer tools and stuff like that. And is the discussion around companies, vendors are building stuff on top of Kubernetes or is Kubernetes also getting some built-in key functionality like a better orchestration? I don't know. So you mentioned service mesh.

Starting point is 00:07:22 Is that getting into Kubernetes or is this an option for vendors to provide services on top of maybe a better platform that Kubernetes is going to support, provide? Yeah, I think the answer is a little bit of both. So, you know, there's always a discussion about, you know, do you make something a native service in Kubernetes? So like, like, for example, Kubernetes has, you know, native services to do, you know, inbound and outbound routing, service discovery, things like that. And then there's, you know, then there's discussion of like, okay, once you start getting into developers, how much of it should be kind of abstracted. So to break it down, I guess, one of the things that

Starting point is 00:08:06 Kubernetes has done over the last couple of releases is it's always had this mechanism to where, you know, if you want to do default scheduling of applications, that's sort of built into Kubernetes. And then you started to have a lot of different use cases that would come along. Some of them would be like vertical industry specific things or, you know, stuff that was kind of out of the mainstream type of job. So, you know, batch jobs, long running jobs, short running jobs. So they've been working very hard at allowing you to build sort of custom controllers. They have a concept called custom resource definitions that allow you to say, okay, you know, I have some special, unique kind of workloads and we may want to build those specific to say, okay, you know, I have some special, unique kind of

Starting point is 00:08:45 workloads. And we may want to build those specific to say our industries, maybe it's a healthcare thing, or it's a, an oil and gas or financial services type of thing. So, so that's, that's one of those things where it's like, it's kind of built into Kubernetes, but the actual implementation, you know, would, would probably be vertical specific. So you see some of those things happening. You see a lot of people that are, you know, you see a lot of projects that are now emerging to say, okay, you know, in the past, you just kind of gave Kubernetes some containers, but, you know, the development experience wasn't much more sophisticated than that. Now we're beginning to see some frameworks. Um, so some, some new ways of doing packaging of applications. So there's a

Starting point is 00:09:30 project called helm. Um, there's some higher level constructs of, you know, how do we, you know, essentially try and shape what the developer kind of desktop experience looks like. And so you see some projects called like a draft. Microsoft has something called draft. Um, there's a few others. And then, um, you start to get into some stuff like people are kind of fascinated with this phenomenon of serverless, right? That, that AWS started with Lambda and they're trying to figure out, okay, can we bring that same concept of, you know, very short running functions, um, very, very, you know, scale down on what the developer writes. And so there's a whole discussion happening about, you know, what should we do for serverless? Should serverless be a feature of Kubernetes? Should people build frameworks,

Starting point is 00:10:17 say, on top of that custom resource definition? And so, you know, there's definitely still some gray areas between is something a feature of Kubernetes and does it just take package them up in a container, that means Kubernetes makes sure that at the scheduled time, there's enough container instances running of that service. They also know how to talk to each other because I assume Kubernetes does the service registry and the service brokerage. So that's there. In terms of orchestration based on load, is that also already built in? That means scale up in case of load, failover in case of certain services start producing error results, then shift traffic over to other instances.

Starting point is 00:11:24 Is this already available or is this something that you said is now coming? Yeah. So that's a, that's a really good question. There's actually a lot of pieces in there. So, um, so from purely the perspective of if I have an application and it needs to scale in some way, um, up till now, Kubernetes has sort of allowed you to scale that based on like CPU thresholds and to a certain extent, like memory thresholds from a node, a worker node. And you could set some thresholds based on, say, like percentage usage or something. eight begins to have at 1.9 and so forth, which are things that'll be available, you know, here in the springtime, starts to get much more granular about the type of metrics that you could use, or, you know, characteristics that you could use to scale it up. So right now, it's, it's pretty

Starting point is 00:12:15 simple, but it's, it's expected to get much more granular. So, so that'll help, that'll help both, I guess, developers as well as operators deal with, you know, let's say scale of an application, but also, you know, dealing with things like DDoS attacks or, you know, security threats. So that's one area that's focused a lot. So, you know, the other thing, and you sort of touched on this early on, so, you know, developer rights and application. One of the big areas that the Kubernetes community is starting to kind of realize is that that sort of, you know, you write your application as one thing. So you write something in Java or, you know, Go or whatever you write it in. But then you kind of have to go through this hand crafting exercise

Starting point is 00:12:55 of writing a bunch of YAML, sort of a description file or a manifest file that says, you know, here's what I want to happen to the application. You know, make sure it always has three instances running, you know, put a load balancer in front of it, you know, kind of descriptive stuff. And there's a big focus right now in the Kubernetes community to say, yeah, you can do things in YAML, but maybe that's not the best way for our community to adopt a lot of developers. Maybe they don't want to deal with all that stuff. So we are beginning to see some, you know, developer-centric tools that say, hey, let's hide a lot of that YAML stuff.

Starting point is 00:13:32 Let's simplify what the user experiences look like. And so we're seeing some of that too. And those things are really kind of outside of the Kubernetes scope in terms of, you know, dealing specifically. But, you know, again, getting to that beyond the first early adopters into more mainstream, how do we make it simpler for them? So a little bit of both of those things are happening. Yeah. And I wanted to, you know, going back to the idea of the auto scaling and moving beyond metrics like CPU and memory,

Starting point is 00:14:02 I was thinking that initially thinking, well, hey, if we see the service response time degrading, that might be a good one to feed in. But then as I was thinking about it, as you were talking, I'm like, well, that can be really dangerous too, because I'm bringing this up in terms of saying like feeding metrics into saying when to scale can be a very complicated concept right because if you have a service that's slowing down in response time you don't necessarily want to add more instances of it because that service might be slowing down because of something else downstream and adding more instances could exacerbate the problem so just a thought that i had that i wanted

Starting point is 00:14:40 to to bring up because it um just's the complexity of when to automatically scale based on data, I guess gets kind of a little wonky when the more you think about it. Yeah, definitely. Yeah, definitely has some, some, you know, positive and negative connotations. And I think you guys coming from the, you know, management monitoring perspective, probably have a very different perspective than, say, I don't know, a developer who just goes like, oh, it would be cool if I could do that. Would you put that feature in, put that nerd knob in there for me? Right.

Starting point is 00:15:14 Well, I think it's a great way to start, right? I mean, it's a great way to start with I want to build an application and I want to put it out there in the wild and I want to make sure that in case some know some i don't know some media outlet puts it up when i get some pr that it doesn't just break because i did i can't wrongly configure the instances i need so i think that's it's a great start but what we see in more complex applications and i think that's what the other brian wanted to get to if you have a kind of a chain of services if you start scaling up the the front end service, but actually don't know that with some of the accounts we work with, they are

Starting point is 00:16:07 using the monitoring data to figure out, hey, who is impacted, right? So the bottom line is, do we slow down our end user experience or do we make it worse? So do we are evaluating some SLAs on our services? And then where is the real failing component that needs to be fixed or scaled up? And then they use the monitoring data to trigger remediation actions or, you know, now we actually start calling itself healing. So from the monitoring data, we say we know that, let's say, 50% of our users, which translates, let's say, to 10,000 users are currently impacted that are using that particular feature. Root is not the front-end service layer,

Starting point is 00:16:47 but it's like a back-end database that cannot keep up because there's an index that's out of date. So this could then trigger an automated remediation action that actually fixes the problem at the root instead of going with the default action, which may be adding more front-end web server instances to handle the incoming load. So I think we're seeing it more right now from the other direction where the monitoring triggers an action. And that action can be implemented in your orchestration tools, whatever you use. And then actually triggering the correct actions using the APIs that like a Kubernetes provides and kind of triggering scale up and scale down from another tool. I think that's what we are seeing.

Starting point is 00:17:32 Yeah, no, I definitely – I think that's a good summary because what it does is it really highlights that the Kubernetes community – and I've never really heard this discussion from them to say, hey, we want to, you know, try and take all the advanced intelligence that platforms like Dynatrace have and try and replicate those, you know, in this sort of lower level container scheduler system. I think they're still very much saying, we want to give you some basic primitives to be able to do things, but, you know, allow your advanced systems to say, yeah, this is what the picture of these microservices really look like. This is where your faults really exist. This is the correlation between these things. And so, yeah, I don't think that that shift in sort of knowledge and awareness of sort of the complexity of all

Starting point is 00:18:20 these microservices is shifting down into the platform level at all. I think, you know you know, it's, it's still very much going to be in these much more high level intelligence systems. And it makes, it makes absolute sense. Cause like you said, um, number one, we're still in really early days with, with understanding how all these distributed systems work. But, but number two, um, like what, what looks like the cause could very well not really be where the cause is. It's just like the symptom, right? Yeah. Exactly. But I mean based on what you explained, I think it's the right way to go, right?

Starting point is 00:18:52 You're building and so Kubernetes is extending its basic – it's a core functionality. I'm sure there's a lot of new APIs that come up. It will build in some of the basic use cases to make onboarding easier on that platform, right? As you said, the kind of the first initial experience should be, hey, this is something I can actually use and work with and build my first app. Because what we see a lot with our enterprise customers, they try to figure out what is the next platform that they're using to build the next application.

Starting point is 00:19:21 Is it going to be Kubernetes based or do I need to go to Cloud Foundry or what do I go and build everything on top of what Microsoft provides to me? And I think if you have this, if you're making this initial kind of prototype proof of concept first project experience smooth and easy, then it's more likely that people will stick with that platform and then figure out, how do we then really scale this to larger more complex enterprise applications yeah yeah i think that's what it's going to be yeah and i think you know at least um you know one sort of trend we're seeing in the industry is that that that first decision of like which platform do i choose um the industry is sort of moving towards towards kubernetes so i I mean, you know, obviously, you know, folks like us at Red Hat have been doing Kubernetes for about three years in the OpenShift platform.

Starting point is 00:20:10 But, you know, we've seen Microsoft making huge bets around Kubernetes. You know, they acquired a company called Deus and just expanded their Kubernetes offering in the Azure cloud. AWS made an announcement two weeks ago that they're now formally supporting Kubernetes. Google obviously has done it for a while. But even like the Cloud Foundry community is now, you know, supporting Kubernetes sort of as a parallel platform to what they tend to do with, you know, like Spring Boot and some of the 12-factor stuff in traditional Cloud Foundry. But even they're supporting Kubernetes as well. So I think we're going to see that accelerate more and more because people can now look at the industry and go, okay, they all seem to be agreeing on one standard in general. And I think the biggest

Starting point is 00:20:53 indicator that Kubernetes has kind of won is that I myself am trying to make a concerted effort to learn it. So if I'm doing it, then it must be. Yeah. So it's funny, actually, after we spoke with Martin and we did the OpenShift one, that's when I was finally like, you know, so Brian, my role, I'm like, I'm a sales engineer and we have to stay on top of a lot of this technology, but it also helps for us to specialize in certain areas so that we can all help each other out. So after we talked with Martin, one of our colleagues, he did a basics of OpenShift several episodes back. And I was just thinking, man, you know what, I'm just going to go figure, you know, get my hands dirty with Docker, move on to Kubernetes, and then move on to OpenShift and really just do that track. And I was thinking, and you addressed this earlier, I was thinking, boy, I wonder what's happening

Starting point is 00:21:46 with the competition to Kubernetes. And Andy and I were just discussing, and we were like, yeah, we've got to make sure to ask about that. And it's funny that they really did become a dominant leader. I think a lot of the other cloud technologies are still a little fuzzy in which ones are coming out ahead. But Kubernetes really is a clear, seems to be,

Starting point is 00:22:06 and I'm not going to try to wave the finish line flag or nothing, but it really seems to be way out ahead of everyone else. Yeah. And I want to coin a new term. I think we need to define the BWI, the Brian Wilson Indicator. So if Brian Wilson is betting on that technology, it's probably mainstream. If I'm going to try to learn it in my spare time, then yeah,

Starting point is 00:22:28 it has to be the winner. Exactly. So now back to the other Brian. Actually, yeah. I'm the other Brian. He's the real Brian. That is really the real Brian. So Brian, on the monitoring side, I mean, we talked about

Starting point is 00:22:44 some aspects of monitoring, but anything else that happened last week at KubeCon, anything where monitoring plays a role? Any discussions about how monitoring has to change some of the requirements of monitoring in a Kubernetes microservice service mesh world? Yeah, I think, I mean, there wasn't a lot of, at least at a community level, there wasn't a lot of monitoring specific sort of net new announcements. You know, I think that a couple of things to take away from that from a monitoring perspective, one of them is just the number of production customers that are now running Kubernetes is, you know, up into the thousands and so forth. So I think from a monitoring perspective, you know, we're going to quickly see companies moving from, you know, things in POCs or, you know, small instances to very large things very quickly if people haven't already seen this. So, you know, that is always a good sign,

Starting point is 00:23:43 right? You find out a lot of things once they go into production and and you start to scale and so forth. So so that was a big trend. You know, I know from our perspective, like we've seen literally customers in in every vertical and in every part of the part of the world that are doing things with Kubernetes and starting to get to very, very big scales. I had a, I had a banking customer I was talking to just before KubeCon and they had said, you know, we, we got up to about a million transactions a day. This was like the day before KubeCon. And then I got a note from them this morning and they said, yeah, I know we're now up to 2 million transactions a day on our system. And we, we feel very comfortable in sort of where it's scaling. So that I think we're going to see more and more kind of across the industry, bigger, bigger production environments. The other one is this concept of service meshes. And you mentioned it a little bit.

Starting point is 00:24:31 It's not a completely new concept. But we are seeing this emergence of a couple of different projects or standards for service meshes, you know, sort of very targeted towards microservices. So there's a there's a project called Istio, I-S-T-I-O, which is, you know, jointly came, you know, originally started by IBM and Google and some folks at Lyft working on, you know, how to do east, west, north, south, kind of very granular routing of traffic, which obviously we'll play into where monitoring is. Lyft had a similar or a kind of a companion project called Envoy, E-N-V-O-Y, which is their proxy technology. But then we also saw some other technology. There's a project called Linkerd, L-I-N-K-E-R-D, which is another project that had kind of spun out of Twitter at one point.

Starting point is 00:25:28 And there was some other ones around there. But so there seems to be a little bit of competition in the service mesh space. So, you know, a couple of different types of implementations that are happening. But there's also, you know, kind of a feeling that these different learnings are going to come together in some sort of common way and that they're kind of being driven today by the web scale people. But, you know, these sort of start at the web scale things and then we see them kind of fall down into, you know, certain large scale, say, financial services, application applications or retail. So that'll probably be something that'll be more of a keep your eye on it toward 2018 and how that impacts places to integrate monitoring, places to integrate tracing and stuff like that. So because the idea of service mesh, as far as I know, and correct me if I'm wrong, it's

Starting point is 00:26:19 kind of like a, I should have proxy is the right word, but it's like a proxy service where ideally all the traffic goes through it and then the service mesh implements concepts such as circuit breaker, load balancing, making sure that in case, let's say, one instance behind the service is constantly returning errors, take it out of rotation. Maybe it'll end launching another one. And if every single transaction actually goes through service meshes, that's what you mentioned, I believe. Watch out for the monitoring perspective because service meshes are obviously another key component to monitor, I would believe, right? Because they will see all the traffic that goes through. They need to be operating just as fast because otherwise they become a performance hotspot and an availability issue. I think I've also seen some talk in blog posts about,

Starting point is 00:27:19 well, why not just only monitor service meshes to do end-to-end tracing? Because they're basically, if every transaction goes through a service mesh, you basically at least know who is talking to whom and what's the response time. I think I saw that somewhere, but I'm not sure if that's the only answer to monitoring, but I think it's an interesting initial approach. Yeah, there was definitely a bunch of conversations about it. I had a good

Starting point is 00:27:46 conversation with a gentleman named Ben Sigelman, who was kind of had been at Google, started this concept called open tracing. And then he had launched a company called Lightstream, or I guess Lightstream's name, he'd been around for a year or so, but they kind of came out of stealth. They're very focused on tracing. And he was fairly bullish on the idea of, you know, how do you integrate tracing then with the, with the service mesh? And again, like you said, you know, having this very granular, uh, you know, hop by hop, uh, service by service kind of visibility. Um, you know, I think there's also, you know, the, the, the one, the couple of ways I've heard this sort of explained at a real simple level is, you know, a lot of the previous sort of service mesh frameworks, if you will.

Starting point is 00:28:29 So things like, you know, what Netflix, the Netflix OSS stack had done and was kind of language specific. So, you know, you'd end up having teams that would build stuff language specific. This is trying to be sort of language independent, if you will. So it's a little more of an infrastructure ops way of going about things. And then, you know, like you said, you know, having a proxy or sort of a sidecar proxy deployed with every application, on one hand sounds very interesting, because you get this very granular visibility. And then the flip side becomes, okay, if we're routing everything through all these proxies, you know, what, what is that going to do to performance? How in the

Starting point is 00:29:08 world am I going to trace how many hops it takes to get from A to B and so forth. So there's definitely a lot of, um, you know, big ideas, but, you know, maybe not as much operational experience as to, you know, how ready this stuff is. Well, in case also Brian, the other Brian, we should probably add links to open tracing and also the diamond trace page on open tracing because diamond trace does support open tracing and so they will be interesting so these are good things to for our listeners to read up so educate yourself around service measures as you say 2018 we may see you know that kind of concept being pushed more, open tracing. From a monitoring perspective, again, because we are obviously trying to figure out topics

Starting point is 00:29:54 that are relevant to the performance community, to the monitoring community. Any other things that you have seen that monitoring tools can do or maybe can't do right now? Maybe some input for us to say, hey, guys, you know, there's so many monitoring vendors out there. But it seems there's still certain things that are just not done well or watch out for that because that's going to hit you in case you don't catch up. Is there any – I know it's a tough question, but is there anything? Yeah, I think we're still very much seeing people that there are very few kind of all microservices greenfields. So people are trying to figure out – I have – if you sort of trace the path of an application that they're building, it's going to be some mix of Java EE applications, maybe a couple of microservices, say on the front end or, or a mobile application. And then it, you know, may have to deal on the backend with, you know, like a, like a mainframe transaction. And so I think they're, they're trying to figure out, um, like one topic that, that, uh, one conversation I had was, um, you know, how do

Starting point is 00:30:58 I deal with an environment where say some of the application is running on containers on top of Kubernetes, uh, but, but the, you know, other parts of it are, you know, a bare metal database or a third party API service, say for like SSO or something else, like what's the way to think about that from a monitoring perspective? And I think people are looking for just some some visual concepts. They're also looking for, you know, is this going to impact me organizationally? You know, when you have such a kind of a hybrid environment in terms of, you know, brownfield applications and some new stuff or stuff that's on a platform and off a platform. So that's, that always kind of becomes the real question, less so than like, do I need this specific technology? It's, you know, do I only need one technology?

Starting point is 00:31:45 Do I need three or four, you know, three or four tools to do that? And, you know, people are looking for guidance in that a lot. Well, I think that's, I mean, hopefully in that conversation that you had, you said, well, Dynatrace is the tool that can actually do that. Because that's actually what we are, you know, what we obviously see with our enterprise customers, as you said, there's a lot of legacy applications out there, and they will become the backbone for some of these new apps that you're building. And therefore, when we designed our current architecture, we made sure that we can trace transactions across different data centers, across cloud providers, across technologies. And what you mentioned in the beginning, right, we still have a lot of customers that have a mainframe somewhere in the back end. And connecting that with the distributed world or with the cloud native world becomes a key requirement.

Starting point is 00:32:36 And that's what we at least solved with our one-agent technology and our PurePath technology. So, well, that's good to know, though, yeah. Yeah, and I think once technology. So, um, uh, well, that's good to know. Yeah. Yeah. And I think once people get a sense of, okay, I have the right tools in place that the next question always becomes, um, you know, what do you recommend, uh, in terms of my organization? Like who, who should be doing, what should I, should I leave, you know, kind of, should I retrain my current ops team? Do you have best practices around, um, maybe evolving what the, the org structure? Cause I think people are beginning to kind of realize like what's your org structure look like will

Starting point is 00:33:14 ultimately kind of impact how you do ops, how effective you are. And, um, so I've seen a lot of people that, that may have a mix of Greenfield Brownfield applications that are now willing to say okay um you know just just trying to adapt this to my old silos doesn't work like what do you recommend so i'd love to hear from you guys you know beyond the technology part like how does the organization evolve well i i think there's there's different answers to that but i mean i can tell you an example that i believe worked pretty well, and that's actually our own internal transformation story. Because we've been around for several years, and we've migrated and kind of transformed to a new model where we took our traditional enterprise AppMont product that we deployed twice a year to our customers or shipped twice a year to our customers and they install on-premise to what we have now where we run both SaaS and on-premise and we ship feature updates every other week. So every spring gets deployed and we also do constant production deployments on a daily basis. What we learned, I believe, is that our development teams are responsible end-to-end for their applications and features.

Starting point is 00:34:25 That means they rely on a platform, which we call our pipeline and our orchestration layer, that is maintained and run as a product by our DevOps team. So our DevOps team owns the pipeline and owns the orchestration engine, and developers are basically using that product that allows them to push code changes through the pipeline into different environments all the way into production and then developers are also responsible for what happens in production that means they obviously want to know in case something fails they they have full access to the monitoring and then they are tasked to tasked to obviously fix in case something is wrong. And that transition wasn't easy.

Starting point is 00:35:10 And I don't want to go through all the details because I believe our listeners have listened to us talking about our transformation story. But I believe we saw a big shift towards enabling individual developer teams, give them the right platforms so that they can focus on what's important and that's creating value for the business but also maintaining that value, which is making sure that these apps and services keep up and running and working very close with business to figure out what the next big things we need to build.

Starting point is 00:35:39 And we kind of went away from the traditional ops team. So there's no, at least in our engineering team that is running and operating the Dynatrace platform, there's no traditional operations team anymore. Interesting. Yeah, that's very similar. So at Red Hat, the OpenShift platform comes in a couple of different flavors. One is software that we will ship to customers and they can run it and operate it anywhere they want. We also host a couple of managed services or SaaS services.

Starting point is 00:36:08 And we've kind of gone through that same transformation. It was, you know, I think originally we had the model of like, well, we'll give that SaaS team the same software every quarter or so. And we're now to, you know, about every three or four days giving them, you know, small partial releases. And they're going through the process of, of both, you know, like you said, doing continuous kind of integration to the SAS application, learning how that works, kind of blurring the line between the engineering team and the ops team. And then, you know, we just got off a call this morning with them and they were talking about, you know, trying to really be good at building tools, automated tools to allow low level self-service, you know, very granular kind of spin up of clusters of things. And but yeah, once you once you force your teams to do that, the feedback loop that they get in their learning

Starting point is 00:36:56 curve kind of takes off, you know, to accelerates like crazy. There's some there's some painful times early on, but the feedback for us has been really similar. And it sounds like it's very similar to your story. Yeah. I got another question. So obviously it is easy or easier for green legacy apps, over to a new concept like containers and Kubernetes? Yeah, so it's a great question. you know, we, when we first kind of went down the path of containers and platforms and stuff, our expectation was, you know, people wouldn't have any interest in, in moving stuff because, you know, moving existing stuff, because the, you know, the thought process was, well,

Starting point is 00:37:54 you'll have to adapt it to the systems and, and, and the cost of doing that. And, you know, are the, are the existing developers still around and all those things? What's been sort of surprising to us is a lot of our customers, you know, beyond just the, the, you know, new kind of microservices, they're building something to change their mobile application or kind of update their kind of customer experience front end. A lot of them, especially with containers have said, Hey, you know, in order for us to make the dollars work, the ROI work of this, these platforms, you know, in order for us to make the dollars work, the ROI work of these platforms, you know, can I move an existing application? So they'll, you know, they'll say, well, you know,

Starting point is 00:38:29 I currently have an application that today just it's a, you know, it's a Java EE application, for example, maybe it runs in JBoss or WebSphere or something. Today runs on Linux fine. Could I put that in a Linux container? And while you go, well, it's, it's, you know, it's kind of a big monolithic thing. Does that really make sense for, for containers? Because people talk about them as being for microservices. We've actually seen a lot of people do that. And the reason being, now you have this common language between the development team or the application team who says, this is how we package it, this is how we test it. And the operations team who says, okay, here's, here's an immutable way of packaging the application. We have immutable infrastructure. We can kind of build a more modern operation around that. And then they, you know, in some

Starting point is 00:39:15 cases they'll, they'll kind of monitor it in somewhat similar ways that they did before, you know, agent based and so forth. But we've been, we've been really kind of pleasantly surprised at how well containers and Kubernetes are working, like for stateless applications and being able to sort of lift and shift applications, you know, to a point where we have a lot of companies that are, you know, doing that today. And, you know, the benefits aren't so much just like, oh, I mean, they get a little benefit that seems sort of like operational efficiency, virtualization type of stuff. But more so it becomes this forcing thing of, OK, my dev team now has this this one common terminology and process around packaging and testing. And then the dev, you know, the ops team has sort of similar and it's helping them with those, you know, sort of dev and ops transition.

Starting point is 00:40:04 And so that that's been really interesting to us. Yeah, that's cool. I like that. So kind of exposing developers in a kind of – I mean we're exposing them to the ops world without having them to completely transform also their apps. Right. Yeah, that's interesting. So I had a workshop last Friday with one of the bigger consulting,

Starting point is 00:40:31 IT consulting companies. And so all of what they are selling right now, at least the team that I worked with, it's all cloud transformation, cloud native transformation. And one of the things that they got very excited about was actually the concept of not only lift and shift but really the kind of breaking the monolith um and so we talked about how can we break the monolith how can we how can we not only move over to container technology

Starting point is 00:40:58 but actually then leverage the fact or the capability of of scaling up individual pieces of the app so we have to always break it apart. And one concept that I introduced them to, which is something that I think APM enables, so monitoring and especially the way we built it, you can install, let's say, Dynatrace on a monolithic app, and then you can draw virtual boundaries around your certain interfaces or your certain methods and classes.

Starting point is 00:41:26 And then you can observe your monolith and figure out how are these kind of components within the monolith talking with each other? What are the real dependencies within the monolith? And which allows you to test your assumptions on, if I have this monolith and my developers tell me, here's a component that they believe they can extract well with the monitoring tool that actually does tracing and season to the bytecode and analyzes who is calling whom on a method to method and component to component level

Starting point is 00:41:55 you can either you know completely destroy these assumptions that they had or say yes that's actually a good a good component we can extract it. We can try to move this out into its own instance or entity and then run it as a separate service and then scale it up and down independently. So kind of breaking the monolith, even though I'm pretty sure it does not work with all applications out there, obviously not without a certain effort.

Starting point is 00:42:22 I think the first step towards figuring out, can we migrate it is actually taking the monolith and kind of shining the light on that monolith while it's running and then figuring out what are kind of the individual components that it could potentially extract. And I think that was a concept that they really liked because you don't have to necessarily do any code modifications to run that kind of exercise prior to extracting the monolith into smaller pieces. Yeah, no, I like that.

Starting point is 00:42:53 A lot of the workshops that I've sat in for similar stuff, how to break up the monolith or how to strangle it, gets into – gets into, you know, you have to have expertise in domain driven development. And then you have to sit down with your business leaders and prioritize, okay, do we really need to be dependent on this? And, and at the end of the day, you, you know, you get some whiteboards, or you get some discussion, but yeah, having, having a tool that will actually give you a sense of like, okay, where, where's your links? Where's your dependencies? Where's your, where's your priorities? I love that. That's a great first step. And like you said, it literally costs you almost nothing to kind of have real data around, you know, what, what, what you could do. And then you can start to figure out, okay, what makes, what makes real sense. And, um, you know,

Starting point is 00:43:37 how do you put business metrics against saying, Hey, do we rewrite something or decouple it? So that's awesome. I like that story. Yeah. and I think if you actually take it a step further, maybe I need to suggest it to our development team because we see so much information within the monoliths. We could run our AI, our machine learning on top of it, and then instead of us coming up with assumptions and drawing the virtual boundaries, the tool itself can say,

Starting point is 00:44:01 we figured out a certain part of your code that kind of is independent and the only kind of touch points with other parts of the code are through these two or three interface methods so that would even be cooler if you are thinking it about thinking it around that okay all right well this is my little self an idea yeah i know that's great yeah um brian anything else that you want to that you want to mention about kubernetes open shift i know we guys we have a partnership with redhead um anything that you want to additionally mention that could be interesting for our listeners for our community that are, as I said,

Starting point is 00:44:45 centered around monitoring Dynatrace. Obviously, I think a lot of our listeners are kind of familiar with Dynatrace. Anything else that you want to tell? Yeah, I'll throw out two quick plugs because I think we've talked about a ton of things and maybe overwhelmed some people. So OpenShift is obviously our

Starting point is 00:45:06 implementation of Kubernetes. It's very enterprise centric. So a couple of things, and we can put these in your show notes. I know we've done a couple of kind of webinar, demo webinars together. We call them the OpenShift Commons community. So if people kind of want to see what that interaction looks like between, you know, your system and what it looks like at a Kubernetes system, um, there's a couple of really nice demo videos that show that and we'll make sure those links get in there. Um, but even, even at a simpler level, if we kind of go back to the Brian Wilson index of, you know, how do I learn this stuff? So we, um, we, we, we work with this really awesome company called Katakoda, who's a training partner. And if you go to katakoda.com, K-A-T-A-C-O-D-A.com, they've got really, really nice tutorials set up. So you go in through the web.

Starting point is 00:45:59 They have environments already built for you, so you don't have to mess around with anything. And then they have a bunch of sort of pre-written tutorials, everything from Kubernetes basics, Docker basics, you know, other things in there. And then we've built a kind of a custom one for OpenShift. So if you go to learn.openshift.com, there's probably about 10 or 12 modules there that'll walk you through all the basics of, you know, setting up applications, monitoring applications, scaling them and so forth. So if you're, if you're like, if you're like other Brian and you're wanting to learn this stuff, it's a, it's an awesome first resource and costs you nothing. And you don't have to have any tools besides just your laptop and a browser. I'm, I'm bookmarking it right now. Yeah. Awesome. That's pretty cool. Thanks for that. So, yeah, it's a, it's a fun space. It's growing really quickly.

Starting point is 00:46:51 You know, it's a really good, healthy community, so very accepting and, you know, lots of people willing to help and stuff. So, you know, for people that might be interested in this stuff, you know, jump in. The water's warm. And you can definitely find people through, you know, Slack channels and other things to help you, whether you're a newbie or you've got kind of an advanced problem. That's great. Cool. The other Brian, anything else from your side before we wrap it up? No, I mean, to me, this was a lot, as you know, these are the things I'm going to be diving into. So listening to this was more learning for me than participating as you could hear um just for anybody who is

Starting point is 00:47:27 interested in following my path of of learning um obviously you can get doc you know docker's free obviously and you can go down with the kubernetes but what i'm just so if if there's a curiosity what i'm using for my model is we have up on the Docker community, our demo application, easy travel. It's, you know, this hokey little, that's actually pretty, pretty advanced. But you know, in terms of the technologies it employs, but it's a it's a we have a Dockerized version of our demo application out there, which you can then run. And so that's what I'm going to plan on running through the ringer with just for standalone Docker, setting it up, then setting it up through, you know, using just straight up Kubernetes, and then seeing what I can do with

Starting point is 00:48:15 all that in OpenShift. And I'm sure there are other applications. But the nice thing about easy travel is you have multiple tiers. So it gives you, instead of just doing something like spring music, you have a multi-tiered application that you can run and play with these things. So that's about all I can contribute to this today is to say if you do want to play with it and you're looking for a multi-tiered application to use up in,

Starting point is 00:48:41 if you search Dynatrace in Docker, you'll find our demo application. And it's got nothing to do with, you know, buying our tool, using our tool. The application runs as its own without anything else. But of course, you can check out the tool as well. That's all I have. And I want to say thank you to Brian for joining.

Starting point is 00:48:57 But I think, Andy, I think it's time to summon the Summaryator. Well, I want to keep it short today. I think what I learned is that the big next thing we will see on the Kubernetes front is making it easier them, that they have a good first experience when putting, let's say, either a prototype or some simple apps on Kubernetes. And then I'm sure people will get stick to it. We also learned that Kubernetes kind of surpassed all of its other competitors. It seems it's the number one choice when it comes to orchestrating of containers.

Starting point is 00:49:46 We also learned today that in case we want to predict the future of technologies, we have to summon the Brian Wilson indicator. We have to look at the Brian Wilson indicator to figure out what's going on. And I was also very happy to have a little excursion around monitoring, how to to monitor I think we are and obviously this is

Starting point is 00:50:08 a podcast that should not be too focused on tools but it seems we at least with the stuff we're doing on supporting open tracing and the way our one agent works giving us full intent visibility was a good way the way our

Starting point is 00:50:23 developers integrated it or built our new Dynatrace platform. And having that said, thanks, Brian, for being on that show. And hopefully you'll have a lot of listeners in the future for your podcast. We'll definitely make sure that you guys get mentions and the link gets back to your podcast. And if you ever want to come back to the show or even bring Tyler as well the next time, you know, you're always invited because it's a great way to educate the larger community out there. Yeah. Thanks, guys. It's been enjoyed the conversation.

Starting point is 00:51:01 And, yeah, we'll definitely have to get Tyler on next time. We'll work out the scheduling better, but thanks for having me on and hopefully everybody has a great holiday season and hopefully we'll get a chance to talk to you guys in 2018. Right. Excellent. Cool. Thank you. Thank you.

PurePerformance - 056 The State of Monitoring in a Kubernetes World with Brian Gracely

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.