PurePerformance - Bringing Observability to .NET with Georg Schausberger and Bernhard Ruebl
Episode Date: October 5, 2020Getting visibility into .NET code whether it runs on a developer machine, on a windows server on-premise or as a serverless function in the cloud is the day2day job of Georg Schausberger (@BombadilTho...mas) and Bernhard Ruebl, part of the Dynatrace .NET Agent Team.In this podcast we hear firsthand about the challenges in bringing observability, monitoring and distributed tracing to the .NET ecosystem. They give us insights about their continued effort to reduce startup and runtime overhead, the innovation that comes out of Microsoft as they are moving towards open standards and the noble automated approach to always validated things don’t break monitored code with the constant update of libraries and frameworks.We also got both to talk about their developer experience when working with commercial tools such as Dynatrace and its PurePath technology as well as open source tools when analyzing and debugging their own code or helping users figure out what’s wrong with their code.In the talk both mentioned other tools which we wanted to provide the links for: Benchmark.NEThttps://benchmarkdotnet.org/articles/overview.html Ben.Demystifier.https://www.nuget.org/packages/Ben.Demystifier/IIS Module tracinghttps://forums.ivanti.com/s/article/How-To-Enable-IIS-Failed-Request-TracingGeorg Schausbergerhttps://twitter.com/BombadilThomashttps://www.linkedin.com/in/georg-schausberger-6898b6141/Bernhard Rüblhttps://www.linkedin.com/in/bernhard-r%C3%BCbl-084881104/
Transcript
Discussion (0)
It's time for Pure Performance!
Get your stopwatches ready, it another episode of Pure Performance.
My name is Brian Wilson and as always with me is my co-host Andy Grabner.
Hello Andy, how are you doing in your tree village?
In my tree village, I know, yeah.
I was surprised to learn about that too and I had to look out the window
and I think I see two trees in my village.
But it seems your commandant- chief knows better how austria looks like but yeah i'm doing pretty well these days and uh i'm also and not sure if the audience knows but i
have a new microphone and hopefully not only the air the air quality the audio quality in
improves but i also hope that the quality of the content
at least stays the same.
We'll see about that.
How about you?
Did you hit the content quality button on that mic?
Oh yeah, see, that's the secret button at the bottom.
I'm doing pretty good.
I'm getting over, you know,
just being overwhelmed by everything,
but we're getting into some nicer weather we had.
I think when we did the last podcast,
we had been getting some snow and we're back up into some nicer weather we had. I think when we did the last podcast, we had been getting some snow
and we're back up into some nice weather
to carry us out through the fall
before it actually gets cold.
So things are going pretty okay
as much as they can in these fun
and interesting times of 2020.
But happy to be back podcasting.
One thing we didn't do last time
was thank our listeners for being patient as we had to do a couple of reruns while we filled up some guests. So just thanks everyone for being patient and sticking with us. And yeah, we're back into recording and we have another great new topic today, Andy. summer about what topics we should focus on and i think you actually brought up that dot net is a
technology we see a lot out there with the companies we work with and then we remembered
we had philip langauer one of our engineers from dynatrace on the podcast i think it was a year
ago almost where he talked about java runtime java performance and then you asked me well isn't there
somebody equivalent to philip that can tell us more about.NET, but then particularly also from a monitoring perspective, how do we actually monitor.NET?
What are the challenges out there?
And that's why, without further ado, I want to introduce actually two guest speakers.
And we are all, it's funny, we're all sitting in the city of Linz, or at least probably you know somewhere in the in the close vicinity
but we're not in the same room thanks to COVID but I want to first you know introduce or let
Georg Schausberger introduce himself before I hand it over to the second guest but Georg you
should be with us and maybe a quick intro to the audience who you are and what you do and yeah.
Yeah so hello everyone this is also my first
time recording a podcast so hopefully everything goes well and it's nice and interesting for you
yeah so my name is skivak charles becker i live in linz in austria i am a part of the dotnet
development team at dynatrace we are developing the code module, which is responsible for taking out data from.NET applications in all kinds of applications.
And yeah, that's pretty much what we are doing.
And so a lot of things behind the scenes and reverse engineering of technologies and a lot of interesting stuff for us.
Very cool.
And then with you, I know, I guess in a different room or in a different building or in a different
place altogether is Bernhard Rüber.
Now I tried to pronounce your name in English, even though I should know how to pronounce
it in German.
But Bernhard, thanks for being with us.
Thanks for having me. I'm so glad to be here. Well, my. But Bernhard, thanks for being with us. Thanks for having me.
I'm so glad to be here.
Well, my name is Bernhard Rüber.
That's the correct way to pronounce it.
And I'm also a developer on the.NET team, working together with Georg on our code module.
I've been in Dynatrace now for five years or so and also live in Linz.
Yeah, I'm happy to tell you stuff about the net and the net performance monitoring today.
Please go on and ask us.
Perfect.
All right.
So, Brian, I do have a couple of questions
and I want to kick it off.
Please do.
Okay.
No, let's go, guys.
Yours might encompass mine mine so let's start
there okay okay so i know there is uh i mean there's there's a long history with dot net at
least when i i remember when i got started back in dot net i think i remember dot net one dot x
and i think a lot of things have changed also from a monitoring perspective georg
you mentioned that you do a lot of reverse engineering,
and I probably want to focus more on
the more recent developments in the.NET landscape.
Can you tell us a little bit
about what technology stacks,
what versions we are mainly dealing with these days,
and where the biggest challenges are if things are getting
easier from a monitoring perspective,
how we get into the.NET runtimes and especially what has changed you mentioned re-engineering or reverse
engineering is this still something that is extremely necessary or have also uh you know
the vendors in this case microsoft have done a better job in allowing vendors like us to get
into the runtime and get the data we need yeah Yeah, so luckily for us, Microsoft also knew about the need of getting some external software
into their runtime and getting data out of it.
So they included the profiling interface very early in.NET.
And this was the first approach I can remember where We were attaching to the runtime and manipulating data there
and changing the final compiled programs
to get our information out of the software and applications.
But since this is a very technical and complicated way
to do things, but still needed to get into many things and details of applications, they build up new things to get easier into performance data.
So there are a lot of event-based approaches where you can subscribe for certain events and get data and information from the application.
But all of these have to be implemented by the technology stack.
So for instance, if you have ESP.NET Core, you get events for the pipeline, for things done, for requests handled.
But you have to rely on the data the framework provider implements into the framework itself.
And of course there are some, we are not at.NET 1 anymore, but we have.NET Core out
since a while and many of our customers and companies are going to 4.NET Core, which means not only Windows is used,
but also Linux and various architectures
like ARM architectures coming up.
And you can't connect to every application runtime directly
since some of them were hosted in Azure.
And so there's a broad range of different platforms
and architectures where.NET is running.
And so we can't rely on one of the methods.
So we have to support many of them.
And that's one of the challenges.
So we have to keep up with all of them.
And in the areas we can't do the old way,
we have to find new ways
to get into the application so that's interesting because i you know from the outside you would
assume well it's a runtime and and regardless where the runtime runs you should be able to get
data in the same consistent way but it seems there's very specifics to the underlying platform is that
because we are also running a lot of native code on that platform to be i don't know to get the
data that we need to do the type of instrumentation or just to be faster and more performant and
scalable is that why there are so many different the way we need to support the different platforms
it's it's not only about that we we need a native library to connect to the profiling interfaces,
as it's called, from the runtime. And if you want to connect the profiler library to the runtime,
you have to have access to the system. And for instance, if you have some predefined instances on Azure,
it's difficult to install all that stuff there and set everything up. And recently, they were
introducing so-called pre-loaded run times. So you have already a running program in Asia,
and you can't modify or connect
any profiler in the running application so easily.
So you're getting caught there.
And the other side,
the more easy things are,
the runtime is built on a specific platform.
So if it has to run on Linux,
it's a Linux application.
And you have to provide libraries
for all platforms and architectures
run there smoothly.
Andy, I wanted to ask one question
because you touched upon it here
and maybe I'm getting a little ahead of things,
but just briefly,
I remember, Andy,
when we talked a while back, when we first had our discussion about.NET Core,
there was this concept that if you're running.NET Core on a Windows container, or in any kind of Windows-related system,
you can still capture your memory metrics and all the other supporting components. But when you were running it in Java, at least access to those,
the JV,
I don't want to say the JVM,
but you know,
the,
the heap and all that other information was not accessible at least back
then.
And I'm just curious if there's an update,
has that changed at all?
Or is that still one of the challenges in monitoring.net core on like a
Linux system?
Or did I completely, or do I completely remember that?
I think you mixed Java and Linux, you wanted to say.
I remember when we had a discussion that.NET on Windows.
Yeah, yeah, exactly, yeah.
Well, we had some improvements there
because Microsoft encountered those issues as well
and addressed them with a new event-based system to provide callbacks to monitoring solutions called NetTrace.
And since this is available with, correct me if I'm wrong, but I think it was.NET Core 3.0. It also became available
in Dynatrace and we started to, let's say, collect GC metrics, for instance, via NetTrace callbacks
and do not have to rely on other components of Dynatrace anymore.
We can do that directly in.NET Agent now.
And I think this is a big improvement
compared to the older implementations
where you have to have performance counters
or stuff like that in.
Right, right.
I remember that was, it's all coming back now.
I remember that was part of the issue,
was that Microsoft itself didn't have
ways, because they were relying on the performance counters
on the Windows-based systems, but they didn't really
have a mechanism for Linux.
So it's great to hear that they've improved that.
Alright, didn't mean to sidetrack too much, I just wanted to
nest that while it's still fresh in my head.
Yeah, no, that's good, but
Bernhard, maybe as a follow-up on this, because you mentioned
this event-driven system, and you also put in the word tracing.
Is this. I mean, again, this is this comes out of of just looking at from the outside and maybe that we have listeners that don't know what we've been doing over the years in regards of application monitoring and distributed tracing.
But you mentioned something about trace information
is being given to us as we subscribe to events.
Does this also include information about actually traces?
Is this, is the.NET Core runtime giving us insights
into trace data, or is this still something
that comes from somewhere completely different?
Yeah, well, you have to distinguish between two or three separate systems here.
There is, of course, the old Windows eventing system called ETW,
which allowed to collect massive amount of trace data from a.NET application via tools like Perfue or the
performance analyzer tools. Now we are relying more specifically on tools that
work within the process of the application itself. There is the NetTrace
format that works just for the application itself when it comes to the
profiler callbacks
I just mentioned.
And there is another tooling that's called Diagnostic Source.
That's said Diagnostic Source is the main source of information when it comes down to
collecting trace data from third party libraries like ASP.NET Core or gRPC or several
other Microsoft libraries that already implemented the changes like Entity Framework, for example.
And this is, of course, an interesting new opportunity where we can dock on to those technologies. And this is where we also can
design what we call sensors to collect our data. But the mechanism that is used for, let's say,
GC metrics or allocation metrics is completely different and that relies on that
net trace format and that is something we have to do in the native component. That is not something
that is publicly available in the.NET framework itself. You can write of course some listeners,
but it's not the same if you hook it into the native interfaces and that's that's a
yeah that's the main challenge for for those uh new sources to get them all together and
uh yeah you can definitely expect to get something out of this in the future
so if i can recap this and if i understand this correctly, then you say a lot of libraries that come either with the core runtime or other popular libraries,
everybody's on pre-instrumentation built in that are then exposing data through these event channels like diagnostic source.
And obviously, that's great for us to pick up, for any tool vendor basically to pick up. But it also means that if you want to get this data, these libraries have to be pre-instrumented
by whoever developed these libraries so that you actually get the data out.
You have to use the API from Microsoft that they provided and you have to build that into
your library.
Yeah.
Now, do you see what's the adoption of these libraries?
Do you then the reason why I'm asking is because I know there's a lot of new tool vendors out there, both in the commercial and also in the open source space.
Let's say, well, we have dot net monitoring now and they're latching on on these publicly available interfaces,
assuming that most applications deliver some type of diagnostics and tracing data. But based on your experience and based on what we see with our customer base,
how big is the adoption of this and how much visibility do you actually get with what comes,
let's say, quote unquote, out of the box and what is additionally needed to truly get full visibility?
Well, we see a lot of positive feedback and positive comments even on open source products as well.
Most of them think about it to implement it.
And, well, if Microsoft, of course, if Microsoft somehow
involved the implementation will be done quite quickly. But what I also see is that some open
source tools require an additional, let's say, helper library or shim to just encapsulate the original library and add those tracing
calls. And of course, if a customer wants to use that, he will have to use that additional
library and not the original one. So I think the adoption will strongly depend if whether or not the third-party vendors will
integrate the support for diagnostic sources into their libraries.
Yeah, and I know there's obviously some initiatives in the open source space around
open telemetry and open tracing. Is this, and Microsoft is also involved in these now i'm not sure how familiar you two guys are
with all the efforts around these projects but isn't wouldn't that be a smarter way to
to just standardize to move towards these standards so that you know it's it's i guess
easier for library vendors to adhere to one standard regardless of the platform
and then also for two vendors well diagnostic source is in fact microsoft's approach to support open tracing
okay that's perfect answer i didn't know that right that's exactly yeah correct me if i'm wrong
georg but uh i think that's that's diagnostic source is exactly exactly the tool from Microsoft to open up spans, to write bags of values and keys to spans
and export that into the OpenTracing span API.
And therefore, if you use DiagnosticSource to report tracing data in your third-party application,
you will have the possibility to use those tools.
Yeah, exactly.
Also from my view, this is the API to use to define all the data you need for open tracing
since open tracing is more than just providing data
from this application it's it's provides tracing through different applications in the end since
you have spans starting connecting from each from one to another application as well so this is also
part of the game since most of the data sources we were talking before are just information in your application
you are using so how how is your application performing what requests are coming in and out
and how long do i did do they take to be processed and so on very cool okay thanks that was that was
good information i didn't know that and i guess there's so many topics floating around and uh
that's great too i think
brian and i always use this podcast to educate ourselves a lot because we learn a lot every
with every podcast and hopefully the listeners that are not as deep down in the weeds as you
guys are with dot net technology and monitoring and tracing also learn something new here
um i i'm not sure who wants to take this either uh georg or bianna but i i know that over
the years and i've been with dynatrace for 12 years now but one of the biggest questions that
always comes up from everybody's is the overhead question because every time you are instrumenting
an app and you're loading something into an app, you are obviously changing the
behavior. There's overheads and there's all sorts of overheads, whether it's, you know,
resource overhead in terms of memory or whether it is additional CPU cycles that we consume.
Now, how are we or how are you guys ensuring that, you know, overhead stays minimal? What are the things we've built into our agent technology
to really make sure we can monitor at that scale
without having an overhead that is impacting the application?
I'm not sure who wants to take it, but OK.
I'm perfectly happy if Georg takes
the terribly difficult question.
Okay, thank you very much for giving the question to me.
That's a very good question.
So there are all kinds of overhead, as you already mentioned.
And one thing is the old big instrumentation approach, which where we are loaded as a library
into the profile interface has, and we are modifying the assemblies and the.NET
code as it is loaded into the runtime.
This adds some startup overhead since we have to build up a model about the assemblies and
the application, which is loaded and manipulating data, writing measurement method calls into the interesting parts of the application.
And this takes quite some time to do so.
And so this is the startup overhead, which is very bad for things like
if you're talking about Azure functions where applications start.
And if you want to scale up, you want to have the functions instantaneously
up and running and taking over requests.
That's one thing and we address it by internal testing how long things take and we do profiling
of our own profiler to measure things there. And we have to tackle all things
of a way you can do there
in terms of not wasting memory,
not doing things all the time.
We are, at the moment,
we are improving things there as well.
Again, we want to avoid manipulating data
we don't need in the application.
So if someone doesn't want to avoid manipulating data we don't need in the application. So if someone doesn't want to monitor, let's say, some messaging technology,
we will disable this part of our library so we don't cause this overhead.
On the other side, the runtime overhead, we have a lot of KPI tests and performance
tests and we are using benchmark.net internally for testing our own implementations. So we
get very good statistically. So this is also a good point to advertise the Benchmark.net project, which is fantastic, in my opinion.
And you can write very good unit tests and tests running applications and see how much overhead a single method or a manipulated method causes in terms of locations, CPU overhead. You get the data of the repeat, what's
happened after the chip test has optimized the code here.
And these tests make sure that we run into our boundaries.
And yeah, and Bernard has some new approach
we are introducing step by step at the moment
about reference samples and I
think you might talk something about that Bernard. Well that's more a
topic that's related to functional testing and making sure that we don't
mess up customers applications but if we want to go on with it I can give you a
short overview what we do there yes please
uh well yeah one of the biggest challenges besides um overhead um is of course that we we have lots and lots of different versions of third-party applications so third-party libraries If we look just at, let's say, the SQL clients, there are tons and tons of versions
on GitHub. And we have no control whether the customers use an old version that just
suits their needs, or if they're a bleeding edge and use the latest preview.
And so since we heavily, partly heavily rely on implementation details as well,
because we add some code in these libraries for our sensors,
we have to make sure that this code that we add is valid in all the different versions of the third-party library.
Earlier, we did this by having a huge manual process where we created a sample application and built it with several selected versions of the third-party library. And recently we have totally automated this and are able to
get to GitHub and say, let's download every assembly, every version of that special
third-party library to our continuous integration environment, build our samples against it and automatically test
every version with our agent.
So this helps a lot.
And we also found some minor issues with several technologies there.
And since we do this, we have practically no manual overhead for automatically testing those versions.
This has really been a big improvement in the.NET sector,
and I'm really happy that our team can work with such a platform
that allows such possibilities.
That's pretty cool.
And so that means we have as part of our ci we are making
sure that code changes are tested against a large number of different libraries and also all the
different versions so that we make sure that our instrumentation doesn't either break things or
cause too much overhead that's phenomenal and it also means we test it the minute it gets out.
We do a nightly sweep of the NuGet repositories,
and the minute a new library approaches, it gets tested,
even for preview versions.
That's cool.
Hey, in that respect, there's obviously so many libraries out there
and so many different versions.
Do we, because we see a lot of environments,
do we also somehow keep track on what is really used out there?
Especially, I don't know, maybe we get calls from people
or do we have any way to know what versions of libraries
are actively used with our custom environments?
Yes, we do.
We have that agent information that is displayed in every process information where we display the whole technology stack,
where we display that's ASP.NET version 4.something
and that's, let's say SQL client five point something and
this
Information also goes into statistics and you can
View them. I think it's it's just an internal
Page for now, but this this information is available
It would even be cool if we would be made aware of let let's say, there is a well-known issue in a certain library.
So nothing that comes in with our agent, but maybe there is an issue with the library.
And we could then proactively notify our users on, hey, you're using this library that actually has an issue.
You may want to consider, I don don't know upgrading downgrading whatever
because we have a lot of information having this idea yeah i think that actually i heard this this
idea somewhere in the company maybe maybe i just picked it up somewhere and i just just came up
yeah but i mean it's logic i mean it's it would be awesome because we have a lot of data there. And, you know, I have to just jump in and say all this testing,
all the going back and forth that you have to do,
while I applaud the idea of people wanting to try to do open source
and do this themselves, when you listen to the effort
that the vendors have to go through to make sure
we're not impacting to keep up with the latest changes,
it just highlights the almost insanity of trying to roll your own,
you know,
how is an,
how,
unless you're hiring your own dedicated team to spend a lot of time on upkeep.
It just seems so much time and effort put into doing open source and doing it on your
own.
And we've all been doing this for years.
I'm not trying to do a commercial for don't use open source.
I mean, there's obviously some cases where it's helpful.
There are cases where you can augment maybe what a vendor isn't doing. But just thinking about the concept of just making sure
your instrumentation work and then getting it to somewhere and presenting it, it just boggles my
mind. So hats off to you all for everything you're doing. And it just really highlights that
while, yes, you can do it yourself, it is not as simple as it seems once you get past the initial stages of,
hey, look, I monitored my code.
Yeah.
I also remember,
so I remember back in the,
I date myself now,
but back in the AppMon days
when I had my Share Your PurePath program
where people could send me PurePaths from AppMon and I analyzed them,
I remember.NET applications where people had challenges
with custom thread pools, custom protocols,
a lot of asynchronous activity going on.
And then sometimes asking, why don't I get information into this asynchronous activity going on and then sometimes, you know, kind of asking,
why don't I get information into this asynchronous activity?
The trace or the pure path, as we call it in Dynatrace, doesn't show the data.
It stops.
Is this a challenge that is still out there in these days of.NET Core with the new interfaces or is following asynchronous paths still a
challenge that really requires a lot of effort for us to get this data?
Well, yeah, it basically is.
I think the challenge is even growing as more and more APIs switch to asynchronous interfaces.
And the main reason why it is a really big challenge for us is because we had to propagate
that path information that we use for tracing a pure path from one task to each other. And back in the days of.NET 4.0,
when the task object was introduced with TPL, and back in the days of.NET 4.5,
when we finally advanced to async await syntax, we didn't have a concept of an async local. Async local, the class itself has been introduced with
the net 4.6. And of course, it would be much easier to use async local for propagating
asynchronous context. And in fact, if you look at the diagnostic source implementation, it just happens there that way.
But since we had it back in.NET 4.0, we had to do our own implementation purely based
on instrumenting code.
And that was quite a challenge.
And I think the challenge lives on since today because we always have that issue with does it actually belong to that invocation or is it perhaps best suited to be a pure path on its own?
We always tend to stick them together now.
And also the timing.
Do you want to have the timing of the first synchronous part or do you want to have the timing of the whole operation?
We decided to give you the whole operation and that also took quite an evolution of some
sensors to get there to represent correct timing for really each and every asynchronous
operation you can call in this API.
And this, yeah, well, it went on until a few sprints ago where we actually finished our support for asynchronous auto sensor.
So you're finally able to get asynchronous method hotspots
and asynchronous background activity now.
And this will be traced to the actual path where you start your asynchronous work.
So, yeah, it has been a challenge.
It is, I think, an ongoing challenge.
And we will tackle this topic a few times in the future as well, I think.
Well, thanks for that.
So especially in this asynchronous execution task topic,
I think it's worth to mention that as one to monitor an application,
you're more interested in the logical view, so to say.
You want to know how much time in total has been used for some asynchronous execution of a method.
But the pure technical way, what is executed when and how,
and how much a thread is reused on other stuff and so on,
is very complex and changing all the time
while executing.
And we have to somehow create a user understandable model and timing in the end and not just show
the technical perspective what's really going on there.
And that's also challenging because this can be quite complex and quite a lot of work.
I just wanted to mention a short project where when I stumbled upon because everybody who has looked at a stack that has been thrown out of exception that has been thrown out of a task we we see the the stack is totally
messed up that doesn't represent the the way we think as a as a developer we we want to see the
stack as it has been developed through the very operations. And there is a very nice project out there.
I have to dig it out again.
Perhaps we can put the URLs in the comment section,
where they actually managed to beautify the stack again
and put together the frames as if they were called synchronously through the asynchronous
state machine.
This is really awesome.
And everybody who develops asynchronous applications, please take a look at those projects.
These are awesome.
Yeah, definitely.
If you have those links, we have the chance to put them in into the summary of the podcast.
And especially because I think this one is going to be very appealing, you know, obviously for people that are interested in the topic in general of monitoring.NET runtimes, but especially engineers and developers that are listening in.
Definitely, let's put these links out there now to that topic because when we
were when you were going down that path of explaining you know making it you know showing
the right data to understand what's actually happening you are you both are engineers you
both are developers and i wonder what from it from a developer to developer perspective, what do you like right now from the capabilities we have in our agent that helps developers to figure out what's going on?
And the second question is, what would you like to see coming in the future?
Well, I especially love metrics.
I've been developing large database applications, and I've encountered GC issues many, many times.
And I really love our GC metrics. I really love how you can see when GC pressure
builds up. And I really love to see that you can easily see on those metrics how an application behaves over time. And if you watch a metric chart for a few hours,
you can easily tell, yeah, well, this is looking good.
This will work 24-7.
Oh, no, this looks odd.
This does not behave right.
And, well, that's a big plus when when using dynatrace what i also love as a developer is
the pure path view deep down into the last possible detail because i i have an easy way of
of telling yeah this is the next this is the exact s statement I anticipate on this point in the code.
This is the exact queue name.
Yeah, that's the right file or that's the right MVC controller method that's been invoked in this call. And this gives me the opportunity to check if anything behaves strange, to check the
detail on an application and see if everything is according to plan.
It's great that you bring up PurePath because it feels like, you know, we started with PurePath
as a technology distributed tracing 15 years ago
but over the last couple of years i think we just kind of got used to that this is a given anyway
because we've done this for so long and and focused so much on the stuff we can do on top
of the pure path but it is great that you this is also why i put the question because i know you are
engineers and i wanted to hear from you on on on what you actually like and why you want to use Dynatrace as an engineer.
Because it is the level of detail, it is the distributed tracing that we've been doing for the last 15 years.
But I think we kind of forgot a little bit in the way we are advertising and the way we talk about Dynatrace, that this is really the cool, hot thing that we have and always had.
And thanks to guys like you two and your teams,
we keep advancing that technology in all different areas.
And also that's what I loved about PureBev because I look at the PureBev
and I know even though I didn't write the code, I know what's wrong
because I can see the patterns.
I can see where time is spent. I can see where things are done that shouldn't be done and this without
any additional effort on my end to put in any type of instrumentation and that's also what i cherish
so much about the purebeth technology yeah so i have one thing to add which i also love and i
think it's great in the product this is this is our so-called method hotspots.
We call it internally auto sensors, or the new ones, the asynchronous auto sensors coming out.
Since you get a very fast feeling about the application, if you have some methods taking a long time,
and if you're a developer of this application,
you know if some methods are allowed to take longer than others.
And if you see, oh, why is this dribble thing taking so long?
It helps a lot to have this kind of view on an application.
And of course, the P4P view is one of the amazing things from the technical perspective.
Very good.
Brian, did you have another topic?
Because I have one more.
I got one that I hope
I'm not putting you all on the spot for,
but since you're so steeped in.NET
and you're also developers,
I wanted to put you on the spot a little.
So I run into a lot of.NET
in our field engagements.
I've been, like Andy, I run into a lot of.NET in our field engagements. I've been,
like Andy, I come from a performance background. I used to have to do a lot of.NET testing back
in the day. So I can't seem to get away from it. And the one thing that bothers me the most about
any time I look at a.NET application and the performance issues, and I'm hoping you might have some insight on why this happens. The two areas that I see most performance problems coming from in a.NET
application are either interactions with the database, typically, you know, Microsoft SQL
database queries just randomly slowing down there, or it's time spent in IIS modules, especially
something like Request Executor.
And then when you go in, there's not much going on on the code side behind it.
So I'm wondering if you have any insight why there's a trend, at least from what I'm seeing,
in.NET applications, where a lot of the performance bottlenecks seem to be either in the IIS
modules or on the database, and not necessarily in the code
where it's easy for an operator like me to pop it up,
response time hotspots, here you go, right?
It always goes to more vague things.
Have you all encountered that?
And do you have any ideas of why that might be the case
with.NET applications?
Well, I can try to answer the database question a little bit,
at least from my perspective and what I've encountered.
And, well, mostly it's connection management.
It always comes down to connection management,
and it sometimes comes down on GC pressure and spending
a lot of time in GCs with large object trees like models from entity framework or stuff like that.
You can easily run into performance issue when you excessively use connection pools and you can use connections from the connection pools.
You can easily run into issues when you keep those connections open too long.
And so the pool gets either exhausted or has to open connections again and again.
And, well, my ground rule for using databases is to keep the scope,
to keep the timeframe of using the database as short as possible,
and to keep the interface as raw as possible. Well, NAD Framework and OR mappers are really great tools,
but, well, for some use cases,
you have to get down to SQL Common or SQL Connection
or even use the link to SQL features where you can access the
database a little bit more on a raw interface thanks so let me try to have
some some thoughts about in is modules and pipeline steps that are showing up there.
I don't have an answer for
all of this, but what we
have seen is
many, many customers use
modules loaded into
the IAS and being executed before
the actual,
if you talk about
the net application is doing some work.
There is IAS API modules going around a lot.
And I guess there's lots of modules and extensions around
and filtering going on.
And some work is handed over to the IS
instead of the application itself.
And that's why sometimes you don't see anything
since there are native modules, libraries used there.
I know in IaaS, maybe I can just add one more thing here.
I remember in Brian, I'm sure we talked about this,
maybe even with Mark Tomlinson,
because I think he has been doing a lot of work in his past
with Microsoft technology.
But the IRS Request Tracer, I'm not sure if that is still around, I think he has been doing a lot of work in his past with Microsoft technology, but the
IRS Request Tracer, I'm not sure if that is still around, but that was an option to turn
on, at least for a short term, some additional logging for these native modules, which at
least give insights on where time is spent.
Yeah.
I remember having those conversations.
Yeah,.NET always just seems to be a different beast.
And I'm not trying to knock.NET at all
because I think it does a lot of things really well,
and especially how they've adapted it to run.NET Core
to run on Linux and all the other components.
From my point of view,
looking at these codes operating within a tool like Dynatrace, right?
It just seems easier from my point of view to analyze non.NET code
because there's always some obfuscation with.NET
where it's going to database or it's in these modules.
And there's this, you know,.NET manages so many of the things.
Like if you were imagining maybe it's your connection pools, right?
And you, not to the database, but even just like on the JVM,
let's say if we go to Java, right?
You can look at your JMX metrics,
find out how many connection pools you're using.
And if you're overextended, you can bump that number up.
Whereas like.NET is, well, we'll take care of it for you.
Don't look here sort of.
And it just, you know, you have to trust.NET
to do what it's supposed to do well,
which is, I guess, the whole point of it.
So it always just gives me,
it gives me heartburn every time I get into a.NET engagement.
Because it was, yeah.
Anyhow, it's great.
And I really appreciate all the work you're doing with that.
Andy, you had another question there?
I had one more on the list where i believe though we've touched upon some of the challenges already
earlier but there is a serverless i think you talked about or you mentioned the term
azure functions earlier which i believe from based on my understanding is basically serverless
technology and i want to just i'm not sure again who is going to take it who has more experience which I believe from based on my understanding is basically serverless technology.
And I want to just, I'm not sure again, who is going to take it,
who has more experience or more thoughts on it,
but just an overview of how do we correctly monitor, how should we think of monitoring serverless applications or serverless functions?
Is there even an easy way for an agent-based solution to get in?
Because as far as I know, these serverless offerings are hosted by, in this case, Microsoft.
And I'm not sure if they allow you to install an agent, which means maybe this is all through
open tracing anyway.
So just some thoughts on serverless monitoring in the internet world? Well, this is a rather very complicated question,
and I think it's not fully answered in Dynatrace.
We're just exploring different approaches,
let's say, for Azure Functions,
but this will have to be tackled in other technologies as well
for Asia functions in particular it's
halfway there because we have for the for the normal consumption plan you have
the support for for a standard profiler and you can use the injection approach of
our profiler and you can use the agent as it is today.
Just select it via the site extension and we will monitor your agent functions.
The sensors have been there since a few sprints.
I don't know the exact release date right now. For environments where stuff like preloaded
and runtimes happens, that's what Georg mentioned earlier, where Microsoft warms up your whole environment to just attach the customer's Azure functions
and not allowing a profiler in at the startup of the environment,
we have to find a different approach.
And we are currently evaluating the possibilities what we can use there.
This is an ongoing process.
I cannot give you more details about that other than we will come up with a solution.
So to say we have several solutions in mind and several approaches and all they have their pros and downs.
But there will be coming something soon, I guess.
But.
We will sort out the best solution for everyone.
Perfect.
Cool. Hey, guys,
coming to about 50 minutes in the show, I want to ask for a scale work and then
you're not to figure out, is there anything, is there any topic or any last piece of wisdom that you want to kind of get out there in the wild saying, this is the most exciting thing that I have in this space.
This is the thing that I would wish our users would know.
Any final thoughts or any topic?
Let's say something from the developers' perspective.
Developers always still tend to try to write
very efficient source code
so that the code, we believe in our code
that it's very efficiently executed but uh as we know
and we've seen while we reverse engineering and looking at the compiled il code from from
csharp or dotnet application and and having to do with with jet issues and things many many things
are optimized by the chip so it. So it's not worth to
write complicated code to read again after some time. But most of the time it's optimized by the
compiler and JIT itself. So don't waste too much time and write clear code and
try to apply clear programming patterns. So I think this is one of the best things you can do
to easily identify performance issues afterwards
and to avoid them in the first place.
Since complicated architecture and complex written code
and optimized things are often hard to debug anyway then.
Thanks for that.
I think that's a great advice
because as you
said, it's complicated enough, the architectures
we're dealing with, but if somebody else
needs to look at the code or if you
yourself have to look at your own code
after months or years,
it's very scary and very hard to get in.
But if you are applying
best practices and good coding
practices, then it's going to be easier
for everyone especially because these runtimes have put in a lot of effort in in making their
chits uh good and optimized very cool bernard anything from you well i've plenty of plenty
of time to think about it and i want to add one suggestion do not always
use server GC
mode
server GC mode especially on systems
with a lot of memory
tend to eat up a lot
of RAM and
will only free it
after a huge amount of time
and
especially if it comes to,
yeah, well, Dockerizing and microservices
and running a lot of processes
on one machine, on one big machine,
that can easily lead to troubles.
We have seen that a couple of times now
with several customers,
and they are up with very, very small services, all configured to use
ServerGC that eat up gigabytes of RAMs because their host has, let's say, 200 gigs of RAM.
So we ended up solving almost every issue just by switching back to desktop.
That's awesome.
Can I ask, is there a way,
because it's something obviously I love to take
these tidbits and then look for them
and see if there's any indicators.
Is there a way in Dynatrace to determine
if it's server or desktop GC?
Any of the meta information in there,
I'd be able to find that out,
or is that something that's under the hood
that we couldn't necessarily see at that level?
I think it's in the process metrics.
I don't know.
We show it in our internal logs since we detected.
I'm not sure if it's put in the product itself,
so really prominent.
It's a good question though from CISO.
And just to touch on that, so if you're seeing
very large heaps with large
collections spaced out
it sounds like because you're saying it'll build up a whole
bunch and then after a long time
do a big giant collection. So if you're seeing
a pattern like that, that might be an indicator
of being run in
server mode. Is that how I
understood it?
Yeah.
Okay.
That's exactly what you will see.
Great.
But maybe take this back to the product team, right?
If we have this information and you have seen this several times
that this is an indicator of potential problems,
maybe we should put this at least into the process overview.
It would be awesome.
Very cool. All right be awesome. Very cool.
All right, Brian.
Andy.
Still my name and still your name, I understand.
I'm going to have to start calling you Wicked the Ewok, though.
What?
Wicked the Ewok, you know, the Ewoks.
So I want to say, I don't want to do a full summary because I think we have covered so much ground and it's hard to summarize.
But Georg and Beana, this is phenomenal what you have explained to us.
It's really great to see and get some insight into what's actually happening in the.NET space around monitoring. I was not aware of the different event sources like the diagnostic source, which I also learned
today is basically the Microsoft open source initiative towards open tracing.
Really interesting to hear that we do a lot of overhead testing and overhead optimization,
both at start time and run time.
I really love the thing around, I made a note here, version hell testing.
So the automated testing of every
version of every library on github to make sure nothing breaks and and your two basically uh
advices in the end are just beautiful uh clean coding practices are better than writing highly
optimized complicated code because somebody else is optimizing it and the whole server gc versus
desktop gc uh it's a great piece of back of background information really yeah pretty much
appreciate that and i just want to add too i know i was getting down on uh people trying to do the
open source monitoring i'll add to that though by saying if none of this conversation has deterred you
from wanting to tackle open source monitoring, come work for us instead. You'll get paid to do
it properly. Because there's a lot of great stuff going on with these things and you get to do a lot
of fun experimentation and playing with stuff. So yeah, always check.
We're always hiring in different areas.
So check out our job opportunity side.
Georg and Bernard, thank you so much for coming on today.
Really appreciate you taking the time,
especially with the time zone differences here.
And good luck with everything else going on
and continue writing awesome code for this awesome clean code for this.
Thank you.
I just wanted to say thank you.
And it was really nice that we had the opportunity to join you.
Yeah.
Also, thanks for me for having us and giving us the time to explain some things. And yeah, I hope there will be
other very interesting topics coming up soon in the podcast, and I have to
look forward for new episodes here.
Great. And if people have any questions,
comments, you can send them to us via Twitter
at Pure underscore dt
do you all
Georg and Bernard are you on social media
or linkedin that you're
looking to
put yourself out there at all or
do you remain quiet and underground
I'm on linkedin
so I'm on linkedin too
and I have also a twitter account so
I think we should put the information somewhere on LinkedIn. I'm on LinkedIn too and I have also a Twitter account.
I think we should put the
information
somewhere.
We'll put that
as well as
those links.
Andy, maybe we
can see if we
can find that
link to that
Microsoft tool
if it's still
available for
the modules.
That would be
good to include
if we can find
that if it's
still out there.
Thank you
everyone for
listening and
we'll be back in
two weeks. Have a great
time and clean coding ahead.
Bye-bye.