PurePerformance - AI Ops Enhancements with Chief Product Officer Andreas Lehofer
Episode Date: January 31, 2019In this episode, Andi has a coffee and a chat with Dynatrace's Chief Product Officer Andreas Lehofer where they dig a little deeper into AI Ops Enhancements....
Transcript
Discussion (0)
Coming to you from Dynatrace Perform in Las Vegas, it's Pure Performance.
Hello from Dynatrace Perform 2019 in Las Vegas. I'm Andy Grabner and this is Up Close and Personal
with product management on Pure Performance. Actually not product
management, it's actually the chief product officers here. Therefore I want to
introduce Andreas Lehofer. Hi, welcome to Vegas. Hi Andy, hi everybody. Great being
here. It's actually Andy to Andy as we just said. Here we go. Andy to Andy. Hey, so
Andy or Andreas, I typically interview people that came out of their own breakout session.
I know you are in the lucky situation that you have all your product managers doing all the tough work this year.
But still hearing from the chief product officer, one of the chief product officers, what's happening in the Dynasties universe, what's happening on the product.
Remember, this is a podcast that people will probably hear that didn't make it to
before. So maybe some of the highlights in your area that you want to transport.
Right. So we have taken our AI engine to the next level and in the breakouts we will speak a lot
about the possibilities that customers have with this pretty significant improvement. So this has a couple of elements, and I think we should touch upon the two bigger ones.
One is the openness of Dynatrace platform altogether, and especially of the Dynatrace
AI engine.
And the other important improvement is around a further reduction of noise
by providing at the same time deeper visibility into root cause,
which is a very difficult thing to do.
And it's also very valuable and important that we have introduced a concept
that we call change point detection to the product, which is doing exactly that. Can you elaborate a little bit on call change point detection to the product which is doing exactly
that. Can you elaborate a little bit on that change point detection? Does this mean detecting
changes in the deployments or what's that all about? It's detecting change points for all time
series and also events that are stored in Dynatrace and it's solving or it's helping us solving a very basic fundamental statistical
problem which is the statistical noise that you are producing when you define baselines
or thresholds.
The problem is however good your baselines or thresholds are, they will always produce
a certain number of what
is called false positives.
From a user perspective this is perceived as event noise, right?
And the more metrics and the more components that you have in an environment, the more
of those false positives you will get.
And there is nothing you can do about that.
It's a statistical fact.
So the only solution to that is to keep the number of baselines and thresholds that you have in your system at a minimum level. You always need them.
That is clear because without a baseline define a threshold, where you calculate
the baseline and alert upon the breach of a baseline to a minimum. So that is obvious,
but if you do that, you run into a different problem, which is that eventually there is
then a significant change in the behavior of a metric that you are not detecting anymore
because there is no baseline on that metric, there is no threshold on the metric that is breached.
And what change point detection does is that in the context of a problem, so let's say
a baseline that you still have on a service level, on an application level is breached.
In the context of that breach of a high level service level baseline, we are
automatically analyzing all the infrastructure metrics that we have and that are related
to that service or that application that is degrading in user experience and we run the
change point detection algorithms to see if there is any change on that level without Wir haben eine Erfahrung und wir machen die Veränderungs- und Anzeigen-Detektion-Algorithmen, um zu sehen, ob es einen Veränderungs- und Anzeigen-Detektion-Algorithmus gibt.
Ohne den Anzeigen-Detektor einen Trash-Hold zu definieren, ohne die Bedürfnisse, eine Basis-Linie nach vorne zu kalkulieren, können wir sehen, ob es signifikante Veränderungen gibt.
Und wenn wir einen Veränderungs- und Anzeigen-Detektor sehen, dann zeigen wir das in der Anzeigen-Interface.
Du findest also alle Grundkosten auf dem Infrastrukturellevel, ohne von diesem Level aus Geräusche zu erzeugen. We are showing that in the user interface, so you will find all the root causes on an infrastructure level without creating noise from that level.
That's pretty cool.
And I think the magic behind the scenes is obviously our dependency model, our smartscape
model.
Right, that makes sure that we are not pointing you to metrics that are irrelevant for the
problem that you are looking at.
And that means also we do this change point detection at the time when we detect there
is actually an impact to the end user and then we walk through the whole dependency
tree through all the metrics, right?
Exactly.
So if you want to give an example, think about CPU consumption.
Everybody is interested to understand whether there is a significant change in CPU consumption,
but it makes no sense at all to alert upon a significant change in CPU consumption, but it makes no sense at all to alert upon
a significant change in CPU consumption on its own.
If this has no impact on user experience, it's not actionable, maybe it's great, maybe
it's just showing that your application is scaling very well, it's able to consume CPU.
But if you have a response time problem on a service level and we
see that at the same time there is a significant change in the way your
application is consuming CPU and it could be by the way either a significant
increase or a significant decrease also significant decrease in CPU consumption
can is something that you absolutely want to know, typically an indicator of a synchronization issue, right?
So we will point you to that.
Sticking with the synchronization issue, let's assume you are collecting a metric on that service,
on a process level that is showing you concurrent threads that are used.
If you have a synchronization issue, you would expect that that metric shows a very significant
behavior change.
Dynatrace AI 2.0 will pick that up with anomaly detection, change point detection as well
and will show you, okay, there is a change in threat behavior and in CPU.
And that makes sense, that's great because you would normally never specify a static threshold on the number of threats.
It doesn't make sense because you expect...
Very difficult, very difficult.
Yeah, that's pretty cool. And then I think I heard from Wolfgang Baer, product management,
but he also talked about that we can feed more data into the AI engine from external tools, so we already covered that. But that means everything you just described,
the whole change detection also applies to metrics that come in from external tools.
Absolutely. And not only metrics, also events. If you send deployment events,
whatever type of event you ingest into Dynatrace, we also look at different patterns
on an event level.
That's cool.
So Andreas, you're one of the three, I think it's three chief product officers.
So your area is everything around AI, everything around Davis?
Around the AI and about the Dynatrace platform. So whenever it is about data ingress
into Dynatrace dashboarding topics, this is my area. Cool. So now a little future prediction.
I know product management never wants to talk about any deadlines or stuff like that, but let's
see in 2020 when we are back in Vegas probably
right for the next perform, what will be the things that you, that we hope to see coming
up?
Maybe one or two things right?
That is a difficult question to ask especially in that area, but I would say there will be,
there are big things coming in a year from now, but I think some of them I have to keep as a secret.
Maybe to speak about one thing that is not a huge thing, though,
is we are continuing to extend this concept of anomaly detection into different areas.
So one of the things that you will see in the next 12 months is extending that into
log monitoring.
So we will automatically discover different patterns in a log file.
So if you see a log line that we have never seen before in the context of a problem, we
will automatically point you to that.
So we are extending that and we are continuously of course investing into our AI capabilities
in general.
Perfect, cool.
Hey Andy, well enjoy the rest of the conference.
Thanks for building such an awesome product with the rest of the team and in case
listeners are actually in Vegas
I assume and they want to talk to you
they can probably find you in the exhibition area
in the marketplace
absolutely, try to get hold of me
it's one of the really important things for me
to get in touch with customers
listen to your feedback
looking forward to talking to you
so thanks Andy for Pure Performance, I am Andy Gradman Listen to your feedback. Looking forward to talking to you. So thanks, Andy.
For Pure Performance, I am Andy Grabman.