PurePerformance - AI Ops Enhancements with Chief Product Officer Andreas Lehofer

Episode Date: January 31, 2019

In this episode, Andi has a coffee and a chat with Dynatrace's Chief Product Officer Andreas Lehofer where they dig a little deeper into AI Ops Enhancements....

Transcript
Discussion (0)
Starting point is 00:00:00 Coming to you from Dynatrace Perform in Las Vegas, it's Pure Performance. Hello from Dynatrace Perform 2019 in Las Vegas. I'm Andy Grabner and this is Up Close and Personal with product management on Pure Performance. Actually not product management, it's actually the chief product officers here. Therefore I want to introduce Andreas Lehofer. Hi, welcome to Vegas. Hi Andy, hi everybody. Great being here. It's actually Andy to Andy as we just said. Here we go. Andy to Andy. Hey, so Andy or Andreas, I typically interview people that came out of their own breakout session. I know you are in the lucky situation that you have all your product managers doing all the tough work this year.
Starting point is 00:00:53 But still hearing from the chief product officer, one of the chief product officers, what's happening in the Dynasties universe, what's happening on the product. Remember, this is a podcast that people will probably hear that didn't make it to before. So maybe some of the highlights in your area that you want to transport. Right. So we have taken our AI engine to the next level and in the breakouts we will speak a lot about the possibilities that customers have with this pretty significant improvement. So this has a couple of elements, and I think we should touch upon the two bigger ones. One is the openness of Dynatrace platform altogether, and especially of the Dynatrace AI engine. And the other important improvement is around a further reduction of noise
Starting point is 00:01:46 by providing at the same time deeper visibility into root cause, which is a very difficult thing to do. And it's also very valuable and important that we have introduced a concept that we call change point detection to the product, which is doing exactly that. Can you elaborate a little bit on call change point detection to the product which is doing exactly that. Can you elaborate a little bit on that change point detection? Does this mean detecting changes in the deployments or what's that all about? It's detecting change points for all time series and also events that are stored in Dynatrace and it's solving or it's helping us solving a very basic fundamental statistical problem which is the statistical noise that you are producing when you define baselines
Starting point is 00:02:34 or thresholds. The problem is however good your baselines or thresholds are, they will always produce a certain number of what is called false positives. From a user perspective this is perceived as event noise, right? And the more metrics and the more components that you have in an environment, the more of those false positives you will get. And there is nothing you can do about that.
Starting point is 00:03:02 It's a statistical fact. So the only solution to that is to keep the number of baselines and thresholds that you have in your system at a minimum level. You always need them. That is clear because without a baseline define a threshold, where you calculate the baseline and alert upon the breach of a baseline to a minimum. So that is obvious, but if you do that, you run into a different problem, which is that eventually there is then a significant change in the behavior of a metric that you are not detecting anymore because there is no baseline on that metric, there is no threshold on the metric that is breached. And what change point detection does is that in the context of a problem, so let's say
Starting point is 00:03:56 a baseline that you still have on a service level, on an application level is breached. In the context of that breach of a high level service level baseline, we are automatically analyzing all the infrastructure metrics that we have and that are related to that service or that application that is degrading in user experience and we run the change point detection algorithms to see if there is any change on that level without Wir haben eine Erfahrung und wir machen die Veränderungs- und Anzeigen-Detektion-Algorithmen, um zu sehen, ob es einen Veränderungs- und Anzeigen-Detektion-Algorithmus gibt. Ohne den Anzeigen-Detektor einen Trash-Hold zu definieren, ohne die Bedürfnisse, eine Basis-Linie nach vorne zu kalkulieren, können wir sehen, ob es signifikante Veränderungen gibt. Und wenn wir einen Veränderungs- und Anzeigen-Detektor sehen, dann zeigen wir das in der Anzeigen-Interface. Du findest also alle Grundkosten auf dem Infrastrukturellevel, ohne von diesem Level aus Geräusche zu erzeugen. We are showing that in the user interface, so you will find all the root causes on an infrastructure level without creating noise from that level.
Starting point is 00:04:49 That's pretty cool. And I think the magic behind the scenes is obviously our dependency model, our smartscape model. Right, that makes sure that we are not pointing you to metrics that are irrelevant for the problem that you are looking at. And that means also we do this change point detection at the time when we detect there is actually an impact to the end user and then we walk through the whole dependency tree through all the metrics, right?
Starting point is 00:05:11 Exactly. So if you want to give an example, think about CPU consumption. Everybody is interested to understand whether there is a significant change in CPU consumption, but it makes no sense at all to alert upon a significant change in CPU consumption, but it makes no sense at all to alert upon a significant change in CPU consumption on its own. If this has no impact on user experience, it's not actionable, maybe it's great, maybe it's just showing that your application is scaling very well, it's able to consume CPU. But if you have a response time problem on a service level and we
Starting point is 00:05:46 see that at the same time there is a significant change in the way your application is consuming CPU and it could be by the way either a significant increase or a significant decrease also significant decrease in CPU consumption can is something that you absolutely want to know, typically an indicator of a synchronization issue, right? So we will point you to that. Sticking with the synchronization issue, let's assume you are collecting a metric on that service, on a process level that is showing you concurrent threads that are used. If you have a synchronization issue, you would expect that that metric shows a very significant
Starting point is 00:06:30 behavior change. Dynatrace AI 2.0 will pick that up with anomaly detection, change point detection as well and will show you, okay, there is a change in threat behavior and in CPU. And that makes sense, that's great because you would normally never specify a static threshold on the number of threats. It doesn't make sense because you expect... Very difficult, very difficult. Yeah, that's pretty cool. And then I think I heard from Wolfgang Baer, product management, but he also talked about that we can feed more data into the AI engine from external tools, so we already covered that. But that means everything you just described,
Starting point is 00:07:07 the whole change detection also applies to metrics that come in from external tools. Absolutely. And not only metrics, also events. If you send deployment events, whatever type of event you ingest into Dynatrace, we also look at different patterns on an event level. That's cool. So Andreas, you're one of the three, I think it's three chief product officers. So your area is everything around AI, everything around Davis? Around the AI and about the Dynatrace platform. So whenever it is about data ingress
Starting point is 00:07:48 into Dynatrace dashboarding topics, this is my area. Cool. So now a little future prediction. I know product management never wants to talk about any deadlines or stuff like that, but let's see in 2020 when we are back in Vegas probably right for the next perform, what will be the things that you, that we hope to see coming up? Maybe one or two things right? That is a difficult question to ask especially in that area, but I would say there will be, there are big things coming in a year from now, but I think some of them I have to keep as a secret.
Starting point is 00:08:34 Maybe to speak about one thing that is not a huge thing, though, is we are continuing to extend this concept of anomaly detection into different areas. So one of the things that you will see in the next 12 months is extending that into log monitoring. So we will automatically discover different patterns in a log file. So if you see a log line that we have never seen before in the context of a problem, we will automatically point you to that. So we are extending that and we are continuously of course investing into our AI capabilities
Starting point is 00:09:14 in general. Perfect, cool. Hey Andy, well enjoy the rest of the conference. Thanks for building such an awesome product with the rest of the team and in case listeners are actually in Vegas I assume and they want to talk to you they can probably find you in the exhibition area in the marketplace
Starting point is 00:09:33 absolutely, try to get hold of me it's one of the really important things for me to get in touch with customers listen to your feedback looking forward to talking to you so thanks Andy for Pure Performance, I am Andy Gradman Listen to your feedback. Looking forward to talking to you. So thanks, Andy. For Pure Performance, I am Andy Grabman.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.