PurePerformance - Successful Enterprise Monitoring Projects with Kayan Hales

Starting point is 00:00:00 It's time for Pure Performance! Get your stopwatches ready, it's time for Pure Performance with Andy Grabner and Brian Wilson. Hello everybody and welcome back to yet another episode. It's always another episode of Pure Performance. Welcome to another episode, Andy. How are you doing on this another episode today? It's a great another episode. I wonder if I would start with saying this is not another episode,

Starting point is 00:00:44 but it's something completely different and you tuned in for the wrong podcast and now for something completely different how's everything going for you andy it's very good and as i just told you in preparation today it's a special day for me and i know it's going to be a special day coming up also for our guest in a couple of days i got a special day coming up for me too special days all over all over yeah no other than that everything is good it is uh as of today after recording early september um summer has accidentally disappeared over the weekend here in austria at least it was warm until saturday and then a cold front came through and it seems that's it with uh the summer fall is here kaput kap. It's kaput. Kaput. It's kaput. Yeah, exactly. Yeah, we actually had a bit of a cold.

Starting point is 00:01:28 We're getting cold nights now and then here in Denver and I looked at the weather forecast and we're going to have a day in the 50s next week. But that's Fahrenheit.

Starting point is 00:01:37 I don't know what that is to Celsius because, you know, I'm a pig-headed American who can't be bothered to learn the metric system. Don't you have your days in your 50s every day?

Starting point is 00:01:46 I'm not that old. Come on. Come on. Anyway, we have a guest. A lovely guest, actually. And I'm very happy to introduce her, even though I will keep it short because I'm pretty sure she can do it much better than I. Kayen Hales.

Starting point is 00:02:04 I'm actually, I guess i jinxed it earlier because i'm definitely screwing up pronouncing her name um but i know that's what it is um it's it's it's phenomenal having her on the podcast because i remember meeting her a couple of years ago when she was a i think a guardian onsite with one of our clients on the North. I think it was Northeast. And sat down, explained the Dynatrace product or helped hopefully, explained the Dynatrace product and then helping the customer.

Starting point is 00:02:37 And it was very clear to see that while she was still, I think early early in the job. It was amazing how she kind of took the technical ideas that the product realized and then pushing it over and helping our customers implement it. And now seeing her in a management role within Dynatrace for the Dynatrace One Premium team, which I think is really exciting. But now I am stopping talking about her, because I'm sure she can do this even better. Hi, how are you? And maybe you want to quickly introduce yourself

Starting point is 00:03:09 to the audience. Thanks, Andy. And I'll just say you almost got my name right. Well, you tried. So I applaud the effort. But for the audience here, my name is Kayan Hills, as Andy mentioned. I did work as a guardian at one of our accounts there in Northeast, as you mentioned. And that actually was my first account and probably the toughest account and the best account. So, yes, I am in management currently. And I do help the technical folks who report to me with their customers doing similar jobs where we're helping customers out, trying to get them to use the best monitoring practices. And it's also cool. I'm just looking at your LinkedIn profile. You started through the PDP program at Dynatrace back in 2014.

Starting point is 00:04:03 I did PDP 319, which was the best class, no matter what anyone else says. Yes, I did start through the PDP program. What's PDP? You got to remember, this is all inside jargon. I don't even know what that means. Right, right. Yeah. So PDP is the Professional Development Program, where they bring you in young and fresh and wide-eyed college grads, bring you in, train you in the technology that we use at Dynatrace and also in the technology that we support. And then they let you loose and you're up to the customers and to enable them to use the product successfully. I didn't even know we had that. That's awesome.

Starting point is 00:04:43 And I think the cool thing, that's also the reason why, you know, one of the reasons why we wanted to bring you on the call is because you have seen firsthand on site with, as you mentioned, one of the toughest accounts we had back then. And kind of really seeing what problems people actually want to solve in the performance space. So today's talk is not about Dynatrace. It's not about the different teams that we have to help our customers. Well, obviously, we want to make sure that everybody understands that we have a PDP program to educate our folks, and we have a Guardian program where we send people on site. And now you're part of Dynatrace One. So we can talk a little bit about it, but it's really interesting to see and hear from people like you, what are the real challenges that large organizations really have when they are reaching out to monitoring vendors like Dynatrace

Starting point is 00:05:41 to say, hey, we need help. Because in the last couple of months, Brian and I specifically focused on CloudNavid. Everybody's talking about Kubernetes. It's the cool new thing. And I'm pretty sure we see this a lot. And this is where monitoring helps. But I know there's much more out there. And this is the reason why we really wanted to get you on the call and get a little overview of what else is happening out there.

Starting point is 00:06:04 What are people trying to solve and what challenges do they have as it comes to performance engineering monitoring alerting and all these things that are relevant for us so um yeah and the question is go ahead yeah the question is are you up for the challenge can we can we ask you a lot of questions yeah absolutely i'm up for the challenge okay so the first question that i have for you or the first kind of area and i know we talked about this in preparation is that uh it seems now we talk a lot about the cloud yet a lot of people are not in the cloud yet uh i think that we see a lot of organizations

Starting point is 00:06:47 that are trying to move into the cloud. Cloud migration is a big topic. Can you fill us a little bit in on what you see out there, especially as people are moving into this new world and what you see there where people are getting challenged or what problems they want to solve? Right, definitely. Yes, I've definitely seen customers make that effort. are getting challenged or what problems they want to solve. Right, definitely.

Starting point is 00:07:08 Yes, I've definitely seen customers make that effort, some customers quicker than others. We've seen some customers where they're more interested in the next shiny object. It's like, oh, yes, we have AWS. Let's just go straight there. Or, okay, now Azure has this new feature. Let's just go there. So they're kind of all over the place in a sense.

Starting point is 00:07:27 Then we have other customers or this other camp of customers where everything's very rigid. Where it's like, no, we can't change our app, no, we don't want to make any changes. Yes, we know the Cloud is this great big thing, but we don't want to do it because we don't want to mess up our apps. What I found is the customers that are more rigid,

Starting point is 00:07:47 it's because they don't understand their application. So they do look to us to help them figure out what exactly is their app doing. And then the ones that are more attracted to that shiny object, it's just exactly as it is. It's like this new feature is promising to help them to do this thing. And so they just want to try it out. But we do have a customer that we've worked with recently

Starting point is 00:08:11 that did a mass migration to AWS. And this customer is such a great example because they actually had a whole strategy behind the migration. So they had Dynatrace installed before that, but they had their own strategy in-house, right?

Starting point is 00:08:31 A strategy that they worked out internally. So they talked about cloud governance, how can they optimize the network? How can they leverage the support that AWS is providing for them? And make sure that they work directly with AWS in order to do that migration. So that when they bring in monitoring tools, we just serve to kind of bridge the gap between what's in the legacy environment and what's in the cloud. For example, doing that comparative analysis.

Starting point is 00:09:06 So okay, your response time is, let's say 10 milliseconds in your legacy environment. We want it to be that or better in the cloud environment. So you want to pull out your monolithic app into those different microservices and have a strategy for that, bring it into the cloud and make a strategy for that, bring it into the cloud and make sure that the requests from the legacy into the cloud environment are lining up and that the response time lines up that way. So I

Starting point is 00:09:34 think customers that actually have a well thought out plan are definitely more successful and they use monitoring tools to their advantage to do that comparative analysis between performance on legacy platforms and performance in the cloud hey and this is you brought up this example with you know monolithic applications on-premise moving to a microservice architecture in the cloud have you seen this always the case that people are actually really kind of replatforming and re-architecturing the app and breaking them up or do you also see examples where they just still do the traditional lift and shift and i would i would assume that you know the requirement would still be the same you want to make sure that the

Starting point is 00:10:21 application runs you know as least as fast if not faster in on somebody else's hardware but do you see maybe from this customer or others more really the full we are re-architecturing for the cloud or do you also see these examples where they're just lifting and shifting yep i've definitely seen both. Going the microservices way is recommended. I think that's the best way, but we've definitely seen customers that they just want to do a one-for-one migration. They want to take the entire monolithic app and put it in the Cloud.

Starting point is 00:10:58 That tends to take a long time. I do remember one of my very first accounts, that customer tried to do that lift and shift that you were talking about. And two years into the account, it still did not happen. Just because to move an entire app that is so big all at once and having so many teams involved, so many people that are interested in different performance metrics and so on. And the timing has to work out right. The change records have to be in correctly.

Starting point is 00:11:30 The firewall rules have to be in place and all of that. It tends to delay things. So definitely we've seen both, but I do think going the microservices route is recommended. Yeah, Andy, I think we saw this or we had someone on from AWS services on a while ago. I think that was the conversation

Starting point is 00:11:50 where their whole thing was they won't even lift and shift. Their whole thing was like, our services are here to help you move to the cloud but convert over to anything

Starting point is 00:12:04 else but lifting and shifting your monolith does that ring a bell andy i kind of remember that conversation it does yeah same point came out like uh kane you said you know it's two years and it's still not quite working and in two years you could have had some or all of that converted over to something possibly more efficient, only leaving the monoliths for things that made sense, but everything else you could have done maybe more efficiently, maybe more cost-effective. Two years is a pretty good length of time,

Starting point is 00:12:41 and to still be struggling with a shift after two years, I can see why people recommend against it, because it just seems like you just lost two years, you know. Right. And the thing is, you can still have a functional app being hybrid. You can still have portions of the app in legacy, portions in the cloud, and still be hybrid. You just have to break up the application into the logical functions, and that creates the microservices, and put those in the cloud or make that gradual change from legacy into the cloud using microservices. And a lot of customers will actually see better performance in the cloud just because those apps are broken up and you're able to function

Starting point is 00:13:19 or I should say focus on that specific part of the app that may not be working. And to say the least, if you're moving bits and pieces, you're taking small chunks at a time with lower risk. You're learning parts of the process as you're doing that. So as you get to more of it, you have so much more experience. It almost feels like the parts and limit is that was that the company from the phoenix project project that was going on and on forever as they

Starting point is 00:13:49 were trying to get it all right you know it's like a little bit here at a time and those will work and then you'll come back you know later on and make some better tweaks to them that you learned on the the last parts but yeah anyhow sorry it just really struck me when you were talking about that it It's great. Thanks. Yeah, absolutely. And I do think, you know, the benefit of doing a little bit at a time is, you know, you do get the opportunity to understand your app. Because like I mentioned before, some of our customers, they don't quite understand what their app is doing.

Starting point is 00:14:26 So when you break it up into those functional bits and pieces, they actually get a good understanding of what their app is doing and say, oh, this part of the app does this and this part of the app is, you know, for user experience and so on. So I do think there's that benefit there. And I think, Brian, I think you mentioned earlier a podcast we had in the past. I remember that discussion and I believe one of the arguments from against lift and shift was if it if you just lift and shift an application into the cloud then

Starting point is 00:14:52 you don't really reap the benefits and maybe then people realize well why did we even lift and shift it's not faster than before well how should it be faster are you just running on a different hardware stack and i think that was also the argument of people want to have a good experience with moving to the cloud, but in order to reap the benefits of the cloud and having a good experience is that you are actually adopting

Starting point is 00:15:13 to these new architectures, leveraging the elasticity of the cloud, leveraging cloud services for things that you may have built and operated in-house earlier, like a database, right? I think that was also one of the reasons why they definitely favor re-architecture and re-platforming apps as they move to the cloud. One question, though, you said you compare before and after.

Starting point is 00:15:39 Is this, were you working then closely with performance engineering teams, load testing teams that were actually able to simulate and put load on these systems simultaneously? That means you still had the old system and you had the new system in place and you were comparing and then you gave the go when you were confident that the new system is faster? Or was it done more like you had data from the old system, you moved over that the new system is faster or was it done more like you had data from the old system, you moved over to the new system and then started comparing on production traffic and then kind of making final tweaks to the new cloud-based system? Yep, so I'll say that what I've seen one customer do,

Starting point is 00:16:22 which is separate from the one that we were talking about earlier, but what I've seen one customer do, which is separate from the one that we were talking about earlier, but what I've seen one customer the application and do another load test on the same day at the same time so they can get comparative data for the performance so i i've seen them do that in the lower environment first so they may start off with the development environment and if that looks good if everything looks okay then they move on to, say, the test environment or the load testing environment, and then they'll move on to production. So that's the method that I've seen so far for customers migrating. And that has worked out so far. The performance metrics that they track would be obviously response time, but also failure

Starting point is 00:17:21 rate. CPU consumption is another big one, and any errors or exceptions that's coming in would be those that they would typically track. Now, moving to the cloud and being in the cloud, do you see additional data that people look at from a performance engineering or monitoring and learning perspective like are there are there new uh i don't know new metrics new something new or different to

Starting point is 00:17:52 the on-premise systems that people always ask for like hey i need to have visibility on this because otherwise i fly blind or so. I've actually seen it the other way, where on-prem, they're looking at so many metrics, trying to build so many reports for upper management because, at least based on what I've seen, they don't quite understand the app. So they're trying to just produce many, many different metrics and try and figure out,

Starting point is 00:18:25 okay, since this metric is down, that means something is wrong or doing it in that approach. Whereas when they move to the cloud, they actually focus on just the metrics that are important and be able to use that to make decisions that affect the application. I've actually seen it the other way where they actually end up using less metrics when they're in the application. I've actually seen it the other way, where they actually end up using less metrics when they're in the cloud. Yeah. Andy, I was just thinking with that whole, or in K2, sorry, but this whole idea of moving to the cloud

Starting point is 00:18:57 and changing the metrics kind of falls into the SRE pattern, where if you think about at least the Google defined SLIs and SLOs, the idea is you're not looking at things like CPU. You're not looking at things like exceptions. You're looking at uptime, response time for the end user. You're looking at what the final product is looking like to determine then if you need to take action and look at stuff in the back end. And I think the cloud, moving to cloud,

Starting point is 00:19:33 moving to these scalable architectures allows you to do that easier because if you're on-prem and your own data center, you have limited resources. You have to maybe rack a new server, try to fine-tune your VMs to get something out if an emergency happens. Whereas while we know we don't recommend throwing hardware at the issue in the cloud even, at least you have that scalability and it's not as much of a scarce resource where you're looking for every little metric and every little number to indicate,

Starting point is 00:20:03 oh my gosh, something might fall over and what are we going to do? Because you have a lot more options. Obviously, the best option is then to go ahead and fix what's causing the problem. But you have, I'd say, the luxury maybe of focusing more on the end user and less on a CPU number, which makes things, in my opinion, at least gives you more of a customer-based focus. Yeah. I would have thought from it, I would have explained it from a different perspective.

Starting point is 00:20:33 Because, Kayan, you said that typically people on the on-premise side don't really know what these apps are doing. So probably they've been around for a long, long time maybe performance wasn't that uh important back then right and nobody really thought about what's actually important so in order to get some type of visibility then well let's get everything we can every single metric and then we figure out which metric might tell us something but still we we collect everything because we don't know better and it was never a requirement to actually define what's important and i believe now and brian this is goes definitely to your point now when you are architecting and developing cloud native applications then one of the

Starting point is 00:21:17 requirements is to figure out what are my slis and my slos that are important to me because they're important to the business? And maybe what handful of other metrics do I need from a technical side to get early warning indicators that something is wrong? And so instead of doing the shotgun approach of getting everything of an unknown system because I need to figure out if case something is wrong, then I want to have all the data.

Starting point is 00:21:42 I think we're moving more towards a, I know exactly what's important to me because we thought about this from the start and therefore these things may get easier even because it's more well-defined. Yeah, exactly. And I definitely agree that scalability is very important and also capacity because once you

Starting point is 00:22:06 move into the cloud, you can put in some automation where if the throughput or if the load is at a certain level, then you can add in another node automatically. So those types of things are what customers tend to look at as well. Very cool. So we talked, can we go back to the, talked about a lot of data. I know you have in preparation, you know, we looked at some of the things that you've been doing and the kind of the belief of we need all the data in the world is obviously something that you know people may feel more more safe if they have a lot of data

Starting point is 00:22:52 but i believe just collecting data for the sake of collecting data so we have data in case we need data but we actually don't even know what the data tells us and we and and who even needs that data is obviously a problem that, you know, it's not actionable, let's call it that way. And I believe, I know you have experienced this with some of the organizations we work with. It's just too much data, but it's not really actionable. Is there kind of a trend that you see,

Starting point is 00:23:22 or is there maybe certain organizations that are better in actually defining and asking for actionable data versus others? And also if you approach people and you find out there's just too much data, how do you make it actionable? So I definitely have worked with customers that they just want all of the data for

Starting point is 00:23:47 sure. But as we look at the different requests that's coming in different transactions that's coming in, there really isn't a need to look at each and every single one. Right? You what what what you want to do is identify any patterns or any trends over time. So you do that with visual reporting. So you can use dashboards or any other type of visual reporting

Starting point is 00:24:11 to present that data to the forefront to say, okay, my application was performing consistently poor over a week. Or you could even look at it from a comparative analysis to say, okay, last week it was an acceptable response time, but this week is not. So it's not necessary to grab all of the data, but it's more important to analyze the data over a period of time to identify patterns. And when you work with these teams,

Starting point is 00:24:50 do they typically know? Like you mentioned the term constantly slow or acceptable performance. Does everyone have a clear definition of what metric they actually would look at to define performance and what their acceptable threshold is and the criteria or is this something that is not always given that people actually know what's what I would say most customers tend to be reactive where they're only looking at the data, even though they they may try to use that as a baseline, but it's not the best baseline, right? It's not the best analytics that they can do. So, what we try to encourage them to do is, you know, look at the trends, look at the patterns, and have the most traffic. Let's say, for example, if it's an e-commerce site, right, it would be Black Friday, for example. So you would look at that time where you have the most traffic and use that to measure your performance.

Starting point is 00:26:15 So if your application can handle that much load, then it's time to scale up or time to consider capacity, time to look at different ways to optimize your code in order to handle more. When you work with these accounts and they ask you we need this type of dashboard we need this type of data i mean i know you're advising a lot because you see you're doing this for many accounts and i know you have we have a lot of different diamond trace one teams that are doing this and i'm sure you're sharing best practices but what i would be interested in we over the last couple of years we've talked a lot about data democratization, meaning opening up data to more people, making it easier to access, meaning giving developers data that they need, giving business, giving performance engineers, giving everyone access the the data we are capturing is actually put on dashboards and these dashboards

Starting point is 00:27:26 are then really used and circulated or made available to a large number of people and also different types of groups or is it still more that you are just we're just building dashboards for i don't know the platform team the performance But other than that, nobody gets access to that data. Oh, yeah, we definitely have cases where we build dashboards as visible to everyone that's logging into Dynatrace just to make sure that the performance of the application is visible for everyone. We definitely have some customers that would rather keep their dashboards specific to their team or specific to their group. But for the most part,

Starting point is 00:28:11 we have just general dashboards that can provide insight for everyone. So we also do funnel dashboarding where you have just one general dashboard, but then you can click in on the different components to open up like a sub dashboard or interconnected dashboards. And that has proved to be very, very useful because you can see everything that you need in just one space. You only have one link that you need to access with all of the metrics that may be important or relevant.

Starting point is 00:28:42 And we, of course, advise on that too. And then based on your specific role or what you're interested in the company, you can drill into that specific dashboard and look at the metrics there. So some may be role-based. So we do have charts and metrics that are specific to developers versus application owners versus database guys

Starting point is 00:29:09 which would be very useful for them so yes we definitely do that and and when you said you have to finally you said you started with saying we're building funnel dashboards now when i hear funnel i immediately think of klaus ensenhofer because he always talks about e-commerce conversion funnels in your case I assume you talk about a funnel more like as a journey from I have an high level overview and then I can either go into component ABC or maybe I'm a developer and therefore I'm interested in this is this what you mean with funnel or maybe maybe I just didn't get this right what do you mean with funnel yep that maybe I just didn't get this right. What do you mean with funnel?

Starting point is 00:29:46 Yeah, that is exactly that. So we have a overview, high-level dashboard. Maybe it's focused on a group of applications that's tied to a specific business unit. So they would go in and they would see the different, maybe one chart per application. And then for a specific app, they can drill in and maybe look at infrastructure metrics or application metrics or performance metrics overall. And then based on what you're interested, you can drill in to infrastructure metrics. Let's say you're specifically an OS guy,

Starting point is 00:30:19 you would drill into infrastructure metrics. If you're on a load test team, you wanna look at performance metrics. If you are focused on user experience, you'd wanna look at the application metrics. If you're on a low test team, you want to look at performance metrics. If you are focused on user experience, you'd want to look at the application metrics. And even going further, like for example, if we pick application metrics, you can go even further into that to look at how different geographies are performing for your application? What requests are users clicking on the most? How do you know if a user actually reached the thank you page on your e-commerce site? Those are the things that would be more of interest to those folks. Very cool. So that means we're advising or

Starting point is 00:31:00 you're advising on having an overview that basically aggregates data on an application level so that means if you have a little tile for an application it would be something like availability metric overall performance basically you know is it is it is it up or not i would assume right or something like that or does it make revenue and things cool. And then drill down from there. Do we share these practices outside of the Dynatrace one kind of group and advising customers? Is there any I don't know. I love my blogs. I love my YouTube videos. Is there anything where we share some best practices more publicly?

Starting point is 00:31:44 Or is this more just for Dynatrace One, for customers that are reaching out to Dynatrace One? I would say from our perspective more so for customers reaching out to Dynatrace One. So whether they do live chat or Dynatrace One Premium, they would get the same best practices. Perfect. And of course, I know you have your blogs and so on. Yeah. No, but it's great that you answered it that way, because that's a call to every

Starting point is 00:32:13 Dynatrace customer that is out there. Please either open up your chat in Dynatrace and then reach out to the Dynatrace One team and say, I have just heard this blog post and I heard this podcast and we should talk about some dashboards, best practices. That's good to hear. Perfect. Yep, absolutely. Coming back to the cloud migration and the topic of re-architecturing for the cloud as we did lift and shift. I want to touch upon a topic that, as I said earlier, Brian and I have been

Starting point is 00:32:52 doing a lot of sessions or episodes on recently, the whole Kubernetes, OpenShift, Cloud Foundry, the whole new container-based platforms. I do assume that we do see this a lot in our accounts, do we? Absolutely, yes. Definitely, we initially saw a lot of Docker, but lately we've been seeing a lot more Kubernetes in our environments with the customers that we work with. And do you see, is there a, I don't know, a trend towards Kubernetes on-premise or running it in the cloud as a managed service?

Starting point is 00:33:32 Is there a trend that you see of Kubernetes versus OpenShift or Cloud Foundry? Is there kind of depending also on the type of organizations we work with any any kind of clear indicator or let's say if a customer knocks on the door and you know they are in health care would you immediately assume or guess based on data we have from others that yeah they are using probably open shift on premise is there something like this we see out there I would say for the customers that we've worked with so far, initially there was majority Docker and PCF, Pivotal Cloud Foundry. And then they would deploy using the Bash,

Starting point is 00:34:19 using the Bash deployment or also Pivotal Web Services. But lately we have seen a shift from that into Kubernetes. Kubernetes either in any of the cloud platforms, more so than on-premise with our customers. And they also offer monitoring to customers who use Kubernetes, either through runtime or build time so that the application owners can control how they want their app to be monitored.

Starting point is 00:34:56 So it's definitely more hands-off where we let the application team choose what they want to monitor, but at the same time, we still help to provide the views that they need to see through dashboards and most definitely help them with alerting, making sure it's being routed to the right person, especially if they're using ITSM systems. So that means, are you then advising the application teams to, I don't know if they want to have, I don't know, pod-based monitoring or code-level monitoring, and they want to get the right dashboards, that they need to put some metadata on the containers, whether it's tags or environment variables that is then triggering or allowing the right dashboards to show the data?

Starting point is 00:35:43 Is this also what you advise, like how they should build the containers or how they should tag it? Is this stuff that you do? Yes, absolutely. So we do suggest certain naming conventions that they should use. Definitely need to add some metadata so that we can pull in the right information to populate the tags. And then the tags really drive everything.

Starting point is 00:36:02 So the tags drive the management zones, which drive the alerting and the dashboarding. So, yes, we definitely allow them to or suggest to them what kind of metadata they should put on their pods. And I remember now a discussion we had last week. Or was it this week? I don't remember. About, you know, we have Perform coming up in February and we're doing hot days, hands-on training days. And I believe you suggested, you know, let's do a hot day session on, I think you called

Starting point is 00:36:32 it pre and post one agent deployment considerations, best practices around tagging and metadata. Because I think this is a really important and we all just learn about how important tagging is and what the best practices are in these very large dynamic microservice environments, correct? Oh, yes, absolutely. The first thing I tell a customer as soon as they deploy the one agent, let's talk about tagging. That should be the first thing. Aside from adding in host groups, definitely. Because tagging is really what's going to drive everything else.

Starting point is 00:37:06 And it's really important to make sure that those are auto tags because no one wants to spend hours creating manual tags. We want them to be auto tags, and we also want them to be relevant. We want to make sure tags are going to drive who needs to get alerted, who needs to see the views, what kind of views we need to create. We also suggest creating specific tags for a blue-green deployment as well, so that there is no confusion about what's active and what's passive. So tagging is crucial and there's best practices that we suggest for tagging your infrastructure

Starting point is 00:37:41 versus tagging your application services versus tagging your application in terms of real user monitoring. Because all of that does play a part in the alerting. The alerting piece is incredibly important, so it's very important to tag your application appropriately. Kane, it almost sounds like what you're saying is that monitoring shouldn't be an afterthought, but something that should be built in from the ground up, right? And I'm saying this kind of with a wry smile on my face, because this is the kind of stuff that we all understand. And besides hearing from you, we hear from a lot of people that this is something you don't just, oh, we're going to

Starting point is 00:38:21 buy a tool and toss it on. Even if you're prometheus no matter what you're using right you have to plan these things properly to really get the benefit out of it i mean you're talking about tagging right now right all the cloud providers have tags you know aws tags the azure tags the gcp tags tags are everywhere setting up for monitoring so that you could leverage these things is so critical. And I'm really glad you're bringing up this point because it's just driving home the concept that, you know, don't just, the old fashioned style, this is going back years to when I was first starting, right, of get a monitoring tool, turn it on, you have five servers, you know, you can handle everything. Nowadays, you can't, there's no way to know what's what. So this is, this has to be baked in early, has to become part of the best practices.

Starting point is 00:39:06 It has to become part of the thought process before you even start moving. Do you find on engagements as you're planning those shifts into the cloud, whether it's a lift and shift or migrating to Kubernetes or whatever it might be are you all participating or finding people asking advice at the planning phases on how we're going to bake good monitoring into this at that point or does that still coming in a little bit later but the successful ones are at least doing it before they go live what's your experience with that the successful ones definitely are the ones doing it before they go live because we have had some customers where they go live with their application. They want to add monitoring afterwards. And then based on the results of the monitoring, they find that they need to re-architect their application all over again. So you do save a lot of time when you bake in monitoring from the very beginning.

Starting point is 00:40:08 When you're setting up your app, you're thinking about, okay, how is this application name even going to show up in the tool? How should I name it appropriately? And how should I label my entities appropriately so that it shows up the way you want it in the tool so that you don't end up doing a lot of configuration after the fact? So naming conventions are really important. And just going back to tagging for a second, it's a case where everything needs to be tagged. So everything needs to have a label.

Starting point is 00:40:35 Otherwise, it gets lost in the wind, right? Can you imagine when you have five blank pieces of paper? Let's imagine that the blank pieces of paper are servers and you have no idea what each server does because you didn't label it. And, you know, humans, we forget. So it's very important that we label things appropriately so that we can come back and say,

Starting point is 00:40:54 ah, yes, this is what this does. So this is who it needs to go to. This is the person who's interested in this because of the label. And I would further suggest too that even if you're not moving to one of these complicated systems you know let's say you still have more of a legacy or for whatever reasons a three tier or you know that five server setup i was mentioning it's still a good idea to do these labeling as

Starting point is 00:41:15 just to start practicing for when and if you do these migrations you'll have that sort of ingrained and it's you know if people can kind of keep track of a small system but getting those practices and making it a habit early is a good thing i think absolutely yes i agree with that so big lesson learned for everyone it's about organizing the data right in the end you can you can collect data collection is easy you know there's so many tools out there that you can use, but it's about thinking how do you organize the data and the organization happens to metadata through tags and is there, um, Brian, you mentioned AWS tags and so on.

Starting point is 00:41:57 Are there a best practices from a meta data and taking perspective of these platforms like AWS, Kubernetes, OpenShift, and so on, that have been kind of established as a standard? Do you see this, that maybe we can all kind of learn about so that we don't have to reinvent the wheel all the time? Or maybe not that every vendor has to come up with their best practices but maybe there's something more you know maybe something has already been defined somewhere i don't know i think in kubernetes there's at least some annotations that are

Starting point is 00:42:34 kind of not defective but quasi standard but is there anything more that you see any type of standards that are evolving around this? So definitely, I would say bringing in the tags that's already available in those platforms. With Dynatrace, we can definitely do that, as well as making sure that you set your metadata or your properties, environment properties, on the application itself, because all of that will be automatically imported, and then we can make use of the data that's already there, as opposed to having to go back and then reconfigure in the UI or in the Dynatrace UI or in the monitoring UI. It's best to just make sure that everything is configured on the app, so that if there are any changes that will automatically

Starting point is 00:43:25 be imported so that goes for metadata for environment variables properties process properties i should say as well as any tagging that's done on the cloud platform itself very cool um just a reminder for everyone that is listening, right? I mean, these are obviously a lot of best practices that we have. And I want to highlight again for folks that are interested in learning more about this and also want to do it hands-on, the Perform 2021 is coming up virtual, but we also have hands-on training days. And that's going to be a hot day session

Starting point is 00:44:05 dedicated to to all of these tagging management zones and all that stuff so definitely shout out to that session in case you're interested in i i would like to talk about one more thing or actually kind of figure out if there is one more thing to talk about but i believe there is now we talked about cloud migration we talked about kubernetes and the new cloud native world in the platform world we talked about a lot of data making it actionable um but we talked a lot about kind of these new technologies what else is out there i mean i'm pretty sure you see also other systems that may not come to mind these days if you google observability which is the kind of the hype word these days are there a lot of other systems that you see large enterprises

Starting point is 00:44:54 especially asking for to get visibility in from a monitoring perspective i would say um citrix is a big one that enterprises are asking for visibility into right now. But we do have monitoring for Citrix where we can do that through what we call an extension. But the benefit of monitoring Citrix is that we can provide a lot of metrics that will help to understand what exactly is going on with Citrix, what exactly you should be paying attention to. So for example, you could look at database transactions, we can look at active sessions, we can look at app sessions, latency, and so on,

Starting point is 00:45:40 specifically for Citrix. So that's definitely been a big ask from enterprises and then we definitely have that capability. So Citrix, obviously, big technology for large enterprises. Is this mainly around, again, I know Citrix, but mainly through Nestor, but we typically talk about other things with Nestor on the podcast and on webinars and on blogs.

Starting point is 00:46:03 But in this case, Citrix is a lot about desktop virtualization, I would assume, right? So it's really about enabling the employees of an organization to really work efficiently with the virtualized applications and virtualized desktops, correct? That is correct. Yep. so we could do citrix virtual apps or citrix and app citrix and desktop all those options very cool andy i hope we don't have to give nester another pair of shoes because you brought them up it just comes to mind whenever the word citrix comes up obviously right uh it's just the way it is yeah uh if you're listening we love you i'm pretty sure he gets his next pair of sneakers anyway because i'm pretty sure once we roll back

Starting point is 00:46:53 to uh being on stage again at some point after covid uh he will be on stage again and even i think speakers that do virtual engagements i'm pretty sure we'll get sneakers too sneakers for speakers that's the program so uh to kind of round this up so it's interesting for me as a confirmation right it's not only the hot cool new thing that uh is important but it's also these technologies that we may sometimes forget because we always talk about Kubernetes. Citrix is one of those. I know we, do we see a lot of SAP out there? Because I know we have SAP support too. Yes, definitely we see SAP. Another big one that came to mind is F5

Starting point is 00:47:38 and IBM, IBM MQ is another big one. So definitely those technologies where it's probably not as popular as your typical Java or.NET. We definitely see those where big customers are asking for support for these technologies. Very cool. I think, I mean, I'm pretty sure we could probably talk a lot about best practices and

Starting point is 00:48:07 other things now technologies but overall i mean for me this was extremely insightful and especially your your scenarios you know we talked about earlier with the cloud migration i think i found this really fascinating because we still i think are just in the beginning of people migrating to the cloud and knowing that uh those that actually have good planning and not just to lift and shift but to really re-platforming other ones that are that tend to be more successful it's great to hear and it's also great to hear that we have a strong dynatrace one team that supports and consults our customers in how to use dynatrace for pre, post, and also I think during their migration.

Starting point is 00:48:48 That's awesome. Is there anything else we are missing? Is there anything we want to get out to people that are listening? They may or may not be Dynatrace customers. So anything from a monitoring side, is there anything where you say, well, I wish more people that look into monitoring that look into performance engineering they would know about this so it will be easier for us i don't know something comes to mind that's a good question let me think about that for a

Starting point is 00:49:15 second because there might be maybe you always run into similar issues that i don't know maybe it's not the right people that are actually interested in performance or is it the big challenge to actually sit down and figure out what is actually important, what needs to be monitored. Nobody really have given me enough thought on what it really is that they want to get out of monitoring besides big dashboards.

Starting point is 00:49:41 Right. I would say I do wish that customers would take the time to understand the application. And I know at customer sites, for some customers, you have a large turnover. So people are going in and out. So the folks that know are leaving and new folks coming in, they probably didn't get the correct training or so on.

Starting point is 00:50:02 But I do think that customers should take the time to really understand the apps and also take the time to really consider the alerting strategy. I do think that's very important. And just like we talked about tags earlier, that's driven by tags as well. But to think about when something happens in Dynatrace or when Dynatrace picks up a problem, who exactly should it go to?

Starting point is 00:50:27 It has to go to the person that can actually take action on the problem. So definitely alerting strategy and understanding the application are two big things

Starting point is 00:50:38 that I want customers to take away. Very cool. Yeah, that's great advice. Brian, is there anything from your end? No, I got mine in during the show. You know, something pops in my head, I interrupt everyone and jump in.

Starting point is 00:50:56 But I think there's a lot of really cool and interesting things that were talked about today. And thank you, Kay, for bringing them all up and i really just um you know in in the beginning you had i got to drop that because i lost the thought um so i'll go with another one i really like um you know how it was very apparent by what you discussed the importance of preparing and planning for your monitoring, no matter what kind of monitoring you're doing, what tools you're using.

Starting point is 00:51:31 Of course, we'd love you to use Dynatrace, but all of these tools, all the modern tooling requires the tags, requires organization, requires planning. Do it now. How you're going to prepare or how you're going to monitor, even as you're talking about how you're going to route it to the right team. So tags, all these things feed into that. So it's not something that should be thought of as an afterthought. You know, we're seeing more and more, and Andy and I, you and I have seen this since all the years we've been working in performance, the rise of the importance of

Starting point is 00:52:03 performance, from performance being an afterthought, from performance testing being that thing that those people do and they just hand us some numbers and they don't know what they're doing anyway, to it becoming more and more and more centrally focused. And it's just really great to see that all happening and really being taken seriously, especially because of how critical of a role it plays

Starting point is 00:52:23 in these modern environments where, yeah, you have the scalability options, you have infrastructure at your fingertips, but as we know, if you just throw hardware at it again, you have the same problem of costing more money, and it's so much easier to spend money in the cloud without knowing it than on-prem. So a huge piece of it, and Kay, thank you so much for sharing your experiences. And I think it's also your experience of going through that program and now being in management is pretty awesome.

Starting point is 00:52:54 That's a great accomplishment on your part. So congratulations to you for just being awesome. Thank you. And it was really a pleasure to be here. Cool. Andy, do you have any, are we summer,

Starting point is 00:53:09 summer, are we summoning the summer today? Sure we can, but we don't have to, but I'll keep it short because I think I already reiterated to most of the things, but Kay and typically at the end, I kind of summarize what I learned.

Starting point is 00:53:27 And I think it comes down to what just just said right it's it's very important that people understand that monitoring is critical but more important is that they understand what to monitor how to monitor how to organize the data also who is the beneficiary of the monitoring data who should be alerted um who should get what data at which time and what level of data do we actually need i also really liked the fact that we are you know seeing these cloud migrations as i just said and helping people into these new platforms like k or OpenShift. But that we, on the other hand, see that this is a diverse world we live in from a technology perspective as well as with a lot of other things in life. And that we still see a lot of customers that are monitoring their Citrix environments

Starting point is 00:54:18 and that are monitoring their SCP, their F5s, their their mqs and so on so it's it's really important that you know we think of monitoring holistically across the whole organization because not today not tomorrow and not in three years will you just have kuben microservices in kubernetes we will have a very diverse set of technology for a very long time. And all these pieces are very important. And so we need to figure out what are the important metrics so we can make the right decision in case something happens. And I think you did a great job in telling us that we as an organization have a great dynamic with one team

Starting point is 00:55:00 that supports our customers. We learn from them across the globe, across different verticals. And we have best practices and great teams that can apply these best practices to all the other customers. And thank you so much for doing this job. And hopefully, yeah, we'll have you back again with more stories in the near future.

Starting point is 00:55:23 Thank you very much, Andy. Awesome. All right. Well, if anybody has any topics or anything they'd like for us to discuss, they can reach out to us at pure underscore DT. Kay, do you do

Starting point is 00:55:39 LinkedIn or anything that you want to get any followers you want to put out there? I'm on LinkedIn at KNHales. KNHales? Yes. H-A-L-E-S, not H-A-Y-E-S. As Andy might spell it, right?

Starting point is 00:55:55 You got it right. H-A-L-E-S. I'll just do my own shame and self-plug. September 4th, my band, we have an EP coming out on streaming it's a compilation of stuff that wasn't released between 2004 and 2008 the first of three eps of unreleased material so i'm excited about that and uh andy anything that you wanted to bring no i think that's it um well actually no i have to bring up one thing because we haven't brought it up today. It's called Captain.

Starting point is 00:56:25 Go to www.captain.sh. It's a new thing we're working on and it would be great to have more people on that project. But that's it. Yep, and we have Perform is going to be virtual this year. So I think there's been some, I don't know if there's been any releases yet. I know there's some stuff internally going on with that so far, but keep a lookout for that

Starting point is 00:56:48 people. That'll, there'll be some news coming out if it hasn't already hit the streets yet on what's going on there. And I guess everyone, you know, you might not be able to get your unicorn, but please remember to vote.

PurePerformance - Successful Enterprise Monitoring Projects with Kayan Hales

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.