PurePerformance - Bringing Observability to .NET with Georg Schausberger and Bernhard Ruebl

Starting point is 00:00:00 It's time for Pure Performance! Get your stopwatches ready, it another episode of Pure Performance. My name is Brian Wilson and as always with me is my co-host Andy Grabner. Hello Andy, how are you doing in your tree village? In my tree village, I know, yeah. I was surprised to learn about that too and I had to look out the window and I think I see two trees in my village. But it seems your commandant- chief knows better how austria looks like but yeah i'm doing pretty well these days and uh i'm also and not sure if the audience knows but i

Starting point is 00:00:56 have a new microphone and hopefully not only the air the air quality the audio quality in improves but i also hope that the quality of the content at least stays the same. We'll see about that. How about you? Did you hit the content quality button on that mic? Oh yeah, see, that's the secret button at the bottom. I'm doing pretty good.

Starting point is 00:01:16 I'm getting over, you know, just being overwhelmed by everything, but we're getting into some nicer weather we had. I think when we did the last podcast, we had been getting some snow and we're back up into some nicer weather we had. I think when we did the last podcast, we had been getting some snow and we're back up into some nice weather to carry us out through the fall before it actually gets cold.

Starting point is 00:01:32 So things are going pretty okay as much as they can in these fun and interesting times of 2020. But happy to be back podcasting. One thing we didn't do last time was thank our listeners for being patient as we had to do a couple of reruns while we filled up some guests. So just thanks everyone for being patient and sticking with us. And yeah, we're back into recording and we have another great new topic today, Andy. summer about what topics we should focus on and i think you actually brought up that dot net is a technology we see a lot out there with the companies we work with and then we remembered we had philip langauer one of our engineers from dynatrace on the podcast i think it was a year

Starting point is 00:02:16 ago almost where he talked about java runtime java performance and then you asked me well isn't there somebody equivalent to philip that can tell us more about.NET, but then particularly also from a monitoring perspective, how do we actually monitor.NET? What are the challenges out there? And that's why, without further ado, I want to introduce actually two guest speakers. And we are all, it's funny, we're all sitting in the city of Linz, or at least probably you know somewhere in the in the close vicinity but we're not in the same room thanks to COVID but I want to first you know introduce or let Georg Schausberger introduce himself before I hand it over to the second guest but Georg you should be with us and maybe a quick intro to the audience who you are and what you do and yeah.

Starting point is 00:03:03 Yeah so hello everyone this is also my first time recording a podcast so hopefully everything goes well and it's nice and interesting for you yeah so my name is skivak charles becker i live in linz in austria i am a part of the dotnet development team at dynatrace we are developing the code module, which is responsible for taking out data from.NET applications in all kinds of applications. And yeah, that's pretty much what we are doing. And so a lot of things behind the scenes and reverse engineering of technologies and a lot of interesting stuff for us. Very cool. And then with you, I know, I guess in a different room or in a different building or in a different

Starting point is 00:03:53 place altogether is Bernhard Rüber. Now I tried to pronounce your name in English, even though I should know how to pronounce it in German. But Bernhard, thanks for being with us. Thanks for having me. I'm so glad to be here. Well, my. But Bernhard, thanks for being with us. Thanks for having me. I'm so glad to be here. Well, my name is Bernhard Rüber. That's the correct way to pronounce it.

Starting point is 00:04:10 And I'm also a developer on the.NET team, working together with Georg on our code module. I've been in Dynatrace now for five years or so and also live in Linz. Yeah, I'm happy to tell you stuff about the net and the net performance monitoring today. Please go on and ask us. Perfect. All right. So, Brian, I do have a couple of questions and I want to kick it off.

Starting point is 00:04:40 Please do. Okay. No, let's go, guys. Yours might encompass mine mine so let's start there okay okay so i know there is uh i mean there's there's a long history with dot net at least when i i remember when i got started back in dot net i think i remember dot net one dot x and i think a lot of things have changed also from a monitoring perspective georg you mentioned that you do a lot of reverse engineering,

Starting point is 00:05:05 and I probably want to focus more on the more recent developments in the.NET landscape. Can you tell us a little bit about what technology stacks, what versions we are mainly dealing with these days, and where the biggest challenges are if things are getting easier from a monitoring perspective, how we get into the.NET runtimes and especially what has changed you mentioned re-engineering or reverse

Starting point is 00:05:31 engineering is this still something that is extremely necessary or have also uh you know the vendors in this case microsoft have done a better job in allowing vendors like us to get into the runtime and get the data we need yeah Yeah, so luckily for us, Microsoft also knew about the need of getting some external software into their runtime and getting data out of it. So they included the profiling interface very early in.NET. And this was the first approach I can remember where We were attaching to the runtime and manipulating data there and changing the final compiled programs to get our information out of the software and applications.

Starting point is 00:06:19 But since this is a very technical and complicated way to do things, but still needed to get into many things and details of applications, they build up new things to get easier into performance data. So there are a lot of event-based approaches where you can subscribe for certain events and get data and information from the application. But all of these have to be implemented by the technology stack. So for instance, if you have ESP.NET Core, you get events for the pipeline, for things done, for requests handled. But you have to rely on the data the framework provider implements into the framework itself. And of course there are some, we are not at.NET 1 anymore, but we have.NET Core out since a while and many of our customers and companies are going to 4.NET Core, which means not only Windows is used,

Starting point is 00:07:25 but also Linux and various architectures like ARM architectures coming up. And you can't connect to every application runtime directly since some of them were hosted in Azure. And so there's a broad range of different platforms and architectures where.NET is running. And so we can't rely on one of the methods. So we have to support many of them.

Starting point is 00:07:57 And that's one of the challenges. So we have to keep up with all of them. And in the areas we can't do the old way, we have to find new ways to get into the application so that's interesting because i you know from the outside you would assume well it's a runtime and and regardless where the runtime runs you should be able to get data in the same consistent way but it seems there's very specifics to the underlying platform is that because we are also running a lot of native code on that platform to be i don't know to get the

Starting point is 00:08:30 data that we need to do the type of instrumentation or just to be faster and more performant and scalable is that why there are so many different the way we need to support the different platforms it's it's not only about that we we need a native library to connect to the profiling interfaces, as it's called, from the runtime. And if you want to connect the profiler library to the runtime, you have to have access to the system. And for instance, if you have some predefined instances on Azure, it's difficult to install all that stuff there and set everything up. And recently, they were introducing so-called pre-loaded run times. So you have already a running program in Asia, and you can't modify or connect

Starting point is 00:09:30 any profiler in the running application so easily. So you're getting caught there. And the other side, the more easy things are, the runtime is built on a specific platform. So if it has to run on Linux, it's a Linux application. And you have to provide libraries

Starting point is 00:09:50 for all platforms and architectures run there smoothly. Andy, I wanted to ask one question because you touched upon it here and maybe I'm getting a little ahead of things, but just briefly, I remember, Andy, when we talked a while back, when we first had our discussion about.NET Core,

Starting point is 00:10:08 there was this concept that if you're running.NET Core on a Windows container, or in any kind of Windows-related system, you can still capture your memory metrics and all the other supporting components. But when you were running it in Java, at least access to those, the JV, I don't want to say the JVM, but you know, the, the heap and all that other information was not accessible at least back then.

Starting point is 00:10:36 And I'm just curious if there's an update, has that changed at all? Or is that still one of the challenges in monitoring.net core on like a Linux system? Or did I completely, or do I completely remember that? I think you mixed Java and Linux, you wanted to say. I remember when we had a discussion that.NET on Windows. Yeah, yeah, exactly, yeah.

Starting point is 00:10:59 Well, we had some improvements there because Microsoft encountered those issues as well and addressed them with a new event-based system to provide callbacks to monitoring solutions called NetTrace. And since this is available with, correct me if I'm wrong, but I think it was.NET Core 3.0. It also became available in Dynatrace and we started to, let's say, collect GC metrics, for instance, via NetTrace callbacks and do not have to rely on other components of Dynatrace anymore. We can do that directly in.NET Agent now. And I think this is a big improvement

Starting point is 00:11:54 compared to the older implementations where you have to have performance counters or stuff like that in. Right, right. I remember that was, it's all coming back now. I remember that was part of the issue, was that Microsoft itself didn't have ways, because they were relying on the performance counters

Starting point is 00:12:10 on the Windows-based systems, but they didn't really have a mechanism for Linux. So it's great to hear that they've improved that. Alright, didn't mean to sidetrack too much, I just wanted to nest that while it's still fresh in my head. Yeah, no, that's good, but Bernhard, maybe as a follow-up on this, because you mentioned this event-driven system, and you also put in the word tracing.

Starting point is 00:12:28 Is this. I mean, again, this is this comes out of of just looking at from the outside and maybe that we have listeners that don't know what we've been doing over the years in regards of application monitoring and distributed tracing. But you mentioned something about trace information is being given to us as we subscribe to events. Does this also include information about actually traces? Is this, is the.NET Core runtime giving us insights into trace data, or is this still something that comes from somewhere completely different? Yeah, well, you have to distinguish between two or three separate systems here.

Starting point is 00:13:10 There is, of course, the old Windows eventing system called ETW, which allowed to collect massive amount of trace data from a.NET application via tools like Perfue or the performance analyzer tools. Now we are relying more specifically on tools that work within the process of the application itself. There is the NetTrace format that works just for the application itself when it comes to the profiler callbacks I just mentioned. And there is another tooling that's called Diagnostic Source.

Starting point is 00:13:52 That's said Diagnostic Source is the main source of information when it comes down to collecting trace data from third party libraries like ASP.NET Core or gRPC or several other Microsoft libraries that already implemented the changes like Entity Framework, for example. And this is, of course, an interesting new opportunity where we can dock on to those technologies. And this is where we also can design what we call sensors to collect our data. But the mechanism that is used for, let's say, GC metrics or allocation metrics is completely different and that relies on that net trace format and that is something we have to do in the native component. That is not something that is publicly available in the.NET framework itself. You can write of course some listeners,

Starting point is 00:15:00 but it's not the same if you hook it into the native interfaces and that's that's a yeah that's the main challenge for for those uh new sources to get them all together and uh yeah you can definitely expect to get something out of this in the future so if i can recap this and if i understand this correctly, then you say a lot of libraries that come either with the core runtime or other popular libraries, everybody's on pre-instrumentation built in that are then exposing data through these event channels like diagnostic source. And obviously, that's great for us to pick up, for any tool vendor basically to pick up. But it also means that if you want to get this data, these libraries have to be pre-instrumented by whoever developed these libraries so that you actually get the data out. You have to use the API from Microsoft that they provided and you have to build that into

Starting point is 00:15:59 your library. Yeah. Now, do you see what's the adoption of these libraries? Do you then the reason why I'm asking is because I know there's a lot of new tool vendors out there, both in the commercial and also in the open source space. Let's say, well, we have dot net monitoring now and they're latching on on these publicly available interfaces, assuming that most applications deliver some type of diagnostics and tracing data. But based on your experience and based on what we see with our customer base, how big is the adoption of this and how much visibility do you actually get with what comes, let's say, quote unquote, out of the box and what is additionally needed to truly get full visibility?

Starting point is 00:16:52 Well, we see a lot of positive feedback and positive comments even on open source products as well. Most of them think about it to implement it. And, well, if Microsoft, of course, if Microsoft somehow involved the implementation will be done quite quickly. But what I also see is that some open source tools require an additional, let's say, helper library or shim to just encapsulate the original library and add those tracing calls. And of course, if a customer wants to use that, he will have to use that additional library and not the original one. So I think the adoption will strongly depend if whether or not the third-party vendors will integrate the support for diagnostic sources into their libraries.

Starting point is 00:17:53 Yeah, and I know there's obviously some initiatives in the open source space around open telemetry and open tracing. Is this, and Microsoft is also involved in these now i'm not sure how familiar you two guys are with all the efforts around these projects but isn't wouldn't that be a smarter way to to just standardize to move towards these standards so that you know it's it's i guess easier for library vendors to adhere to one standard regardless of the platform and then also for two vendors well diagnostic source is in fact microsoft's approach to support open tracing okay that's perfect answer i didn't know that right that's exactly yeah correct me if i'm wrong georg but uh i think that's that's diagnostic source is exactly exactly the tool from Microsoft to open up spans, to write bags of values and keys to spans

Starting point is 00:18:52 and export that into the OpenTracing span API. And therefore, if you use DiagnosticSource to report tracing data in your third-party application, you will have the possibility to use those tools. Yeah, exactly. Also from my view, this is the API to use to define all the data you need for open tracing since open tracing is more than just providing data from this application it's it's provides tracing through different applications in the end since you have spans starting connecting from each from one to another application as well so this is also

Starting point is 00:19:39 part of the game since most of the data sources we were talking before are just information in your application you are using so how how is your application performing what requests are coming in and out and how long do i did do they take to be processed and so on very cool okay thanks that was that was good information i didn't know that and i guess there's so many topics floating around and uh that's great too i think brian and i always use this podcast to educate ourselves a lot because we learn a lot every with every podcast and hopefully the listeners that are not as deep down in the weeds as you guys are with dot net technology and monitoring and tracing also learn something new here

Starting point is 00:20:20 um i i'm not sure who wants to take this either uh georg or bianna but i i know that over the years and i've been with dynatrace for 12 years now but one of the biggest questions that always comes up from everybody's is the overhead question because every time you are instrumenting an app and you're loading something into an app, you are obviously changing the behavior. There's overheads and there's all sorts of overheads, whether it's, you know, resource overhead in terms of memory or whether it is additional CPU cycles that we consume. Now, how are we or how are you guys ensuring that, you know, overhead stays minimal? What are the things we've built into our agent technology to really make sure we can monitor at that scale

Starting point is 00:21:11 without having an overhead that is impacting the application? I'm not sure who wants to take it, but OK. I'm perfectly happy if Georg takes the terribly difficult question. Okay, thank you very much for giving the question to me. That's a very good question. So there are all kinds of overhead, as you already mentioned. And one thing is the old big instrumentation approach, which where we are loaded as a library

Starting point is 00:21:40 into the profile interface has, and we are modifying the assemblies and the.NET code as it is loaded into the runtime. This adds some startup overhead since we have to build up a model about the assemblies and the application, which is loaded and manipulating data, writing measurement method calls into the interesting parts of the application. And this takes quite some time to do so. And so this is the startup overhead, which is very bad for things like if you're talking about Azure functions where applications start. And if you want to scale up, you want to have the functions instantaneously

Starting point is 00:22:26 up and running and taking over requests. That's one thing and we address it by internal testing how long things take and we do profiling of our own profiler to measure things there. And we have to tackle all things of a way you can do there in terms of not wasting memory, not doing things all the time. We are, at the moment, we are improving things there as well.

Starting point is 00:22:59 Again, we want to avoid manipulating data we don't need in the application. So if someone doesn't want to avoid manipulating data we don't need in the application. So if someone doesn't want to monitor, let's say, some messaging technology, we will disable this part of our library so we don't cause this overhead. On the other side, the runtime overhead, we have a lot of KPI tests and performance tests and we are using benchmark.net internally for testing our own implementations. So we get very good statistically. So this is also a good point to advertise the Benchmark.net project, which is fantastic, in my opinion. And you can write very good unit tests and tests running applications and see how much overhead a single method or a manipulated method causes in terms of locations, CPU overhead. You get the data of the repeat, what's

Starting point is 00:24:06 happened after the chip test has optimized the code here. And these tests make sure that we run into our boundaries. And yeah, and Bernard has some new approach we are introducing step by step at the moment about reference samples and I think you might talk something about that Bernard. Well that's more a topic that's related to functional testing and making sure that we don't mess up customers applications but if we want to go on with it I can give you a

Starting point is 00:24:42 short overview what we do there yes please uh well yeah one of the biggest challenges besides um overhead um is of course that we we have lots and lots of different versions of third-party applications so third-party libraries If we look just at, let's say, the SQL clients, there are tons and tons of versions on GitHub. And we have no control whether the customers use an old version that just suits their needs, or if they're a bleeding edge and use the latest preview. And so since we heavily, partly heavily rely on implementation details as well, because we add some code in these libraries for our sensors, we have to make sure that this code that we add is valid in all the different versions of the third-party library. Earlier, we did this by having a huge manual process where we created a sample application and built it with several selected versions of the third-party library. And recently we have totally automated this and are able to

Starting point is 00:26:09 get to GitHub and say, let's download every assembly, every version of that special third-party library to our continuous integration environment, build our samples against it and automatically test every version with our agent. So this helps a lot. And we also found some minor issues with several technologies there. And since we do this, we have practically no manual overhead for automatically testing those versions. This has really been a big improvement in the.NET sector, and I'm really happy that our team can work with such a platform

Starting point is 00:26:57 that allows such possibilities. That's pretty cool. And so that means we have as part of our ci we are making sure that code changes are tested against a large number of different libraries and also all the different versions so that we make sure that our instrumentation doesn't either break things or cause too much overhead that's phenomenal and it also means we test it the minute it gets out. We do a nightly sweep of the NuGet repositories, and the minute a new library approaches, it gets tested,

Starting point is 00:27:35 even for preview versions. That's cool. Hey, in that respect, there's obviously so many libraries out there and so many different versions. Do we, because we see a lot of environments, do we also somehow keep track on what is really used out there? Especially, I don't know, maybe we get calls from people or do we have any way to know what versions of libraries

Starting point is 00:28:00 are actively used with our custom environments? Yes, we do. We have that agent information that is displayed in every process information where we display the whole technology stack, where we display that's ASP.NET version 4.something and that's, let's say SQL client five point something and this Information also goes into statistics and you can View them. I think it's it's just an internal

Starting point is 00:28:35 Page for now, but this this information is available It would even be cool if we would be made aware of let let's say, there is a well-known issue in a certain library. So nothing that comes in with our agent, but maybe there is an issue with the library. And we could then proactively notify our users on, hey, you're using this library that actually has an issue. You may want to consider, I don don't know upgrading downgrading whatever because we have a lot of information having this idea yeah i think that actually i heard this this idea somewhere in the company maybe maybe i just picked it up somewhere and i just just came up yeah but i mean it's logic i mean it's it would be awesome because we have a lot of data there. And, you know, I have to just jump in and say all this testing,

Starting point is 00:29:28 all the going back and forth that you have to do, while I applaud the idea of people wanting to try to do open source and do this themselves, when you listen to the effort that the vendors have to go through to make sure we're not impacting to keep up with the latest changes, it just highlights the almost insanity of trying to roll your own, you know, how is an,

Starting point is 00:29:57 how, unless you're hiring your own dedicated team to spend a lot of time on upkeep. It just seems so much time and effort put into doing open source and doing it on your own. And we've all been doing this for years. I'm not trying to do a commercial for don't use open source. I mean, there's obviously some cases where it's helpful. There are cases where you can augment maybe what a vendor isn't doing. But just thinking about the concept of just making sure

Starting point is 00:30:30 your instrumentation work and then getting it to somewhere and presenting it, it just boggles my mind. So hats off to you all for everything you're doing. And it just really highlights that while, yes, you can do it yourself, it is not as simple as it seems once you get past the initial stages of, hey, look, I monitored my code. Yeah. I also remember, so I remember back in the, I date myself now,

Starting point is 00:30:58 but back in the AppMon days when I had my Share Your PurePath program where people could send me PurePaths from AppMon and I analyzed them, I remember.NET applications where people had challenges with custom thread pools, custom protocols, a lot of asynchronous activity going on. And then sometimes asking, why don't I get information into this asynchronous activity going on and then sometimes, you know, kind of asking, why don't I get information into this asynchronous activity?

Starting point is 00:31:29 The trace or the pure path, as we call it in Dynatrace, doesn't show the data. It stops. Is this a challenge that is still out there in these days of.NET Core with the new interfaces or is following asynchronous paths still a challenge that really requires a lot of effort for us to get this data? Well, yeah, it basically is. I think the challenge is even growing as more and more APIs switch to asynchronous interfaces. And the main reason why it is a really big challenge for us is because we had to propagate that path information that we use for tracing a pure path from one task to each other. And back in the days of.NET 4.0,

Starting point is 00:32:27 when the task object was introduced with TPL, and back in the days of.NET 4.5, when we finally advanced to async await syntax, we didn't have a concept of an async local. Async local, the class itself has been introduced with the net 4.6. And of course, it would be much easier to use async local for propagating asynchronous context. And in fact, if you look at the diagnostic source implementation, it just happens there that way. But since we had it back in.NET 4.0, we had to do our own implementation purely based on instrumenting code. And that was quite a challenge. And I think the challenge lives on since today because we always have that issue with does it actually belong to that invocation or is it perhaps best suited to be a pure path on its own?

Starting point is 00:33:51 We always tend to stick them together now. And also the timing. Do you want to have the timing of the first synchronous part or do you want to have the timing of the whole operation? We decided to give you the whole operation and that also took quite an evolution of some sensors to get there to represent correct timing for really each and every asynchronous operation you can call in this API. And this, yeah, well, it went on until a few sprints ago where we actually finished our support for asynchronous auto sensor. So you're finally able to get asynchronous method hotspots

Starting point is 00:34:42 and asynchronous background activity now. And this will be traced to the actual path where you start your asynchronous work. So, yeah, it has been a challenge. It is, I think, an ongoing challenge. And we will tackle this topic a few times in the future as well, I think. Well, thanks for that. So especially in this asynchronous execution task topic, I think it's worth to mention that as one to monitor an application,

Starting point is 00:35:22 you're more interested in the logical view, so to say. You want to know how much time in total has been used for some asynchronous execution of a method. But the pure technical way, what is executed when and how, and how much a thread is reused on other stuff and so on, is very complex and changing all the time while executing. And we have to somehow create a user understandable model and timing in the end and not just show the technical perspective what's really going on there.

Starting point is 00:36:13 And that's also challenging because this can be quite complex and quite a lot of work. I just wanted to mention a short project where when I stumbled upon because everybody who has looked at a stack that has been thrown out of exception that has been thrown out of a task we we see the the stack is totally messed up that doesn't represent the the way we think as a as a developer we we want to see the stack as it has been developed through the very operations. And there is a very nice project out there. I have to dig it out again. Perhaps we can put the URLs in the comment section, where they actually managed to beautify the stack again and put together the frames as if they were called synchronously through the asynchronous

Starting point is 00:37:10 state machine. This is really awesome. And everybody who develops asynchronous applications, please take a look at those projects. These are awesome. Yeah, definitely. If you have those links, we have the chance to put them in into the summary of the podcast. And especially because I think this one is going to be very appealing, you know, obviously for people that are interested in the topic in general of monitoring.NET runtimes, but especially engineers and developers that are listening in. Definitely, let's put these links out there now to that topic because when we

Starting point is 00:37:48 were when you were going down that path of explaining you know making it you know showing the right data to understand what's actually happening you are you both are engineers you both are developers and i wonder what from it from a developer to developer perspective, what do you like right now from the capabilities we have in our agent that helps developers to figure out what's going on? And the second question is, what would you like to see coming in the future? Well, I especially love metrics. I've been developing large database applications, and I've encountered GC issues many, many times. And I really love our GC metrics. I really love how you can see when GC pressure builds up. And I really love to see that you can easily see on those metrics how an application behaves over time. And if you watch a metric chart for a few hours,

Starting point is 00:39:09 you can easily tell, yeah, well, this is looking good. This will work 24-7. Oh, no, this looks odd. This does not behave right. And, well, that's a big plus when when using dynatrace what i also love as a developer is the pure path view deep down into the last possible detail because i i have an easy way of of telling yeah this is the next this is the exact s statement I anticipate on this point in the code. This is the exact queue name.

Starting point is 00:39:50 Yeah, that's the right file or that's the right MVC controller method that's been invoked in this call. And this gives me the opportunity to check if anything behaves strange, to check the detail on an application and see if everything is according to plan. It's great that you bring up PurePath because it feels like, you know, we started with PurePath as a technology distributed tracing 15 years ago but over the last couple of years i think we just kind of got used to that this is a given anyway because we've done this for so long and and focused so much on the stuff we can do on top of the pure path but it is great that you this is also why i put the question because i know you are engineers and i wanted to hear from you on on on what you actually like and why you want to use Dynatrace as an engineer.

Starting point is 00:40:50 Because it is the level of detail, it is the distributed tracing that we've been doing for the last 15 years. But I think we kind of forgot a little bit in the way we are advertising and the way we talk about Dynatrace, that this is really the cool, hot thing that we have and always had. And thanks to guys like you two and your teams, we keep advancing that technology in all different areas. And also that's what I loved about PureBev because I look at the PureBev and I know even though I didn't write the code, I know what's wrong because I can see the patterns. I can see where time is spent. I can see where things are done that shouldn't be done and this without

Starting point is 00:41:31 any additional effort on my end to put in any type of instrumentation and that's also what i cherish so much about the purebeth technology yeah so i have one thing to add which i also love and i think it's great in the product this is this is our so-called method hotspots. We call it internally auto sensors, or the new ones, the asynchronous auto sensors coming out. Since you get a very fast feeling about the application, if you have some methods taking a long time, and if you're a developer of this application, you know if some methods are allowed to take longer than others. And if you see, oh, why is this dribble thing taking so long?

Starting point is 00:42:13 It helps a lot to have this kind of view on an application. And of course, the P4P view is one of the amazing things from the technical perspective. Very good. Brian, did you have another topic? Because I have one more. I got one that I hope I'm not putting you all on the spot for, but since you're so steeped in.NET

Starting point is 00:42:37 and you're also developers, I wanted to put you on the spot a little. So I run into a lot of.NET in our field engagements. I've been, like Andy, I run into a lot of.NET in our field engagements. I've been, like Andy, I come from a performance background. I used to have to do a lot of.NET testing back in the day. So I can't seem to get away from it. And the one thing that bothers me the most about any time I look at a.NET application and the performance issues, and I'm hoping you might have some insight on why this happens. The two areas that I see most performance problems coming from in a.NET

Starting point is 00:43:12 application are either interactions with the database, typically, you know, Microsoft SQL database queries just randomly slowing down there, or it's time spent in IIS modules, especially something like Request Executor. And then when you go in, there's not much going on on the code side behind it. So I'm wondering if you have any insight why there's a trend, at least from what I'm seeing, in.NET applications, where a lot of the performance bottlenecks seem to be either in the IIS modules or on the database, and not necessarily in the code where it's easy for an operator like me to pop it up,

Starting point is 00:43:50 response time hotspots, here you go, right? It always goes to more vague things. Have you all encountered that? And do you have any ideas of why that might be the case with.NET applications? Well, I can try to answer the database question a little bit, at least from my perspective and what I've encountered. And, well, mostly it's connection management.

Starting point is 00:44:18 It always comes down to connection management, and it sometimes comes down on GC pressure and spending a lot of time in GCs with large object trees like models from entity framework or stuff like that. You can easily run into performance issue when you excessively use connection pools and you can use connections from the connection pools. You can easily run into issues when you keep those connections open too long. And so the pool gets either exhausted or has to open connections again and again. And, well, my ground rule for using databases is to keep the scope, to keep the timeframe of using the database as short as possible,

Starting point is 00:45:20 and to keep the interface as raw as possible. Well, NAD Framework and OR mappers are really great tools, but, well, for some use cases, you have to get down to SQL Common or SQL Connection or even use the link to SQL features where you can access the database a little bit more on a raw interface thanks so let me try to have some some thoughts about in is modules and pipeline steps that are showing up there. I don't have an answer for all of this, but what we

Starting point is 00:46:10 have seen is many, many customers use modules loaded into the IAS and being executed before the actual, if you talk about the net application is doing some work. There is IAS API modules going around a lot.

Starting point is 00:46:29 And I guess there's lots of modules and extensions around and filtering going on. And some work is handed over to the IS instead of the application itself. And that's why sometimes you don't see anything since there are native modules, libraries used there. I know in IaaS, maybe I can just add one more thing here. I remember in Brian, I'm sure we talked about this,

Starting point is 00:46:58 maybe even with Mark Tomlinson, because I think he has been doing a lot of work in his past with Microsoft technology. But the IRS Request Tracer, I'm not sure if that is still around, I think he has been doing a lot of work in his past with Microsoft technology, but the IRS Request Tracer, I'm not sure if that is still around, but that was an option to turn on, at least for a short term, some additional logging for these native modules, which at least give insights on where time is spent. Yeah.

Starting point is 00:47:21 I remember having those conversations. Yeah,.NET always just seems to be a different beast. And I'm not trying to knock.NET at all because I think it does a lot of things really well, and especially how they've adapted it to run.NET Core to run on Linux and all the other components. From my point of view, looking at these codes operating within a tool like Dynatrace, right?

Starting point is 00:47:49 It just seems easier from my point of view to analyze non.NET code because there's always some obfuscation with.NET where it's going to database or it's in these modules. And there's this, you know,.NET manages so many of the things. Like if you were imagining maybe it's your connection pools, right? And you, not to the database, but even just like on the JVM, let's say if we go to Java, right? You can look at your JMX metrics,

Starting point is 00:48:14 find out how many connection pools you're using. And if you're overextended, you can bump that number up. Whereas like.NET is, well, we'll take care of it for you. Don't look here sort of. And it just, you know, you have to trust.NET to do what it's supposed to do well, which is, I guess, the whole point of it. So it always just gives me,

Starting point is 00:48:31 it gives me heartburn every time I get into a.NET engagement. Because it was, yeah. Anyhow, it's great. And I really appreciate all the work you're doing with that. Andy, you had another question there? I had one more on the list where i believe though we've touched upon some of the challenges already earlier but there is a serverless i think you talked about or you mentioned the term azure functions earlier which i believe from based on my understanding is basically serverless

Starting point is 00:49:04 technology and i want to just i'm not sure again who is going to take it who has more experience which I believe from based on my understanding is basically serverless technology. And I want to just, I'm not sure again, who is going to take it, who has more experience or more thoughts on it, but just an overview of how do we correctly monitor, how should we think of monitoring serverless applications or serverless functions? Is there even an easy way for an agent-based solution to get in? Because as far as I know, these serverless offerings are hosted by, in this case, Microsoft. And I'm not sure if they allow you to install an agent, which means maybe this is all through open tracing anyway.

Starting point is 00:49:37 So just some thoughts on serverless monitoring in the internet world? Well, this is a rather very complicated question, and I think it's not fully answered in Dynatrace. We're just exploring different approaches, let's say, for Azure Functions, but this will have to be tackled in other technologies as well for Asia functions in particular it's halfway there because we have for the for the normal consumption plan you have the support for for a standard profiler and you can use the injection approach of

Starting point is 00:50:30 our profiler and you can use the agent as it is today. Just select it via the site extension and we will monitor your agent functions. The sensors have been there since a few sprints. I don't know the exact release date right now. For environments where stuff like preloaded and runtimes happens, that's what Georg mentioned earlier, where Microsoft warms up your whole environment to just attach the customer's Azure functions and not allowing a profiler in at the startup of the environment, we have to find a different approach. And we are currently evaluating the possibilities what we can use there.

Starting point is 00:51:24 This is an ongoing process. I cannot give you more details about that other than we will come up with a solution. So to say we have several solutions in mind and several approaches and all they have their pros and downs. But there will be coming something soon, I guess. But. We will sort out the best solution for everyone. Perfect. Cool. Hey, guys,

Starting point is 00:51:54 coming to about 50 minutes in the show, I want to ask for a scale work and then you're not to figure out, is there anything, is there any topic or any last piece of wisdom that you want to kind of get out there in the wild saying, this is the most exciting thing that I have in this space. This is the thing that I would wish our users would know. Any final thoughts or any topic? Let's say something from the developers' perspective. Developers always still tend to try to write very efficient source code so that the code, we believe in our code

Starting point is 00:52:43 that it's very efficiently executed but uh as we know and we've seen while we reverse engineering and looking at the compiled il code from from csharp or dotnet application and and having to do with with jet issues and things many many things are optimized by the chip so it. So it's not worth to write complicated code to read again after some time. But most of the time it's optimized by the compiler and JIT itself. So don't waste too much time and write clear code and try to apply clear programming patterns. So I think this is one of the best things you can do to easily identify performance issues afterwards

Starting point is 00:53:30 and to avoid them in the first place. Since complicated architecture and complex written code and optimized things are often hard to debug anyway then. Thanks for that. I think that's a great advice because as you said, it's complicated enough, the architectures we're dealing with, but if somebody else

Starting point is 00:53:52 needs to look at the code or if you yourself have to look at your own code after months or years, it's very scary and very hard to get in. But if you are applying best practices and good coding practices, then it's going to be easier for everyone especially because these runtimes have put in a lot of effort in in making their

Starting point is 00:54:12 chits uh good and optimized very cool bernard anything from you well i've plenty of plenty of time to think about it and i want to add one suggestion do not always use server GC mode server GC mode especially on systems with a lot of memory tend to eat up a lot of RAM and

Starting point is 00:54:38 will only free it after a huge amount of time and especially if it comes to, yeah, well, Dockerizing and microservices and running a lot of processes on one machine, on one big machine, that can easily lead to troubles.

Starting point is 00:54:57 We have seen that a couple of times now with several customers, and they are up with very, very small services, all configured to use ServerGC that eat up gigabytes of RAMs because their host has, let's say, 200 gigs of RAM. So we ended up solving almost every issue just by switching back to desktop. That's awesome. Can I ask, is there a way, because it's something obviously I love to take

Starting point is 00:55:28 these tidbits and then look for them and see if there's any indicators. Is there a way in Dynatrace to determine if it's server or desktop GC? Any of the meta information in there, I'd be able to find that out, or is that something that's under the hood that we couldn't necessarily see at that level?

Starting point is 00:55:46 I think it's in the process metrics. I don't know. We show it in our internal logs since we detected. I'm not sure if it's put in the product itself, so really prominent. It's a good question though from CISO. And just to touch on that, so if you're seeing very large heaps with large

Starting point is 00:56:09 collections spaced out it sounds like because you're saying it'll build up a whole bunch and then after a long time do a big giant collection. So if you're seeing a pattern like that, that might be an indicator of being run in server mode. Is that how I understood it?

Starting point is 00:56:25 Yeah. Okay. That's exactly what you will see. Great. But maybe take this back to the product team, right? If we have this information and you have seen this several times that this is an indicator of potential problems, maybe we should put this at least into the process overview.

Starting point is 00:56:43 It would be awesome. Very cool. All right be awesome. Very cool. All right, Brian. Andy. Still my name and still your name, I understand. I'm going to have to start calling you Wicked the Ewok, though. What? Wicked the Ewok, you know, the Ewoks.

Starting point is 00:57:01 So I want to say, I don't want to do a full summary because I think we have covered so much ground and it's hard to summarize. But Georg and Beana, this is phenomenal what you have explained to us. It's really great to see and get some insight into what's actually happening in the.NET space around monitoring. I was not aware of the different event sources like the diagnostic source, which I also learned today is basically the Microsoft open source initiative towards open tracing. Really interesting to hear that we do a lot of overhead testing and overhead optimization, both at start time and run time. I really love the thing around, I made a note here, version hell testing. So the automated testing of every

Starting point is 00:57:46 version of every library on github to make sure nothing breaks and and your two basically uh advices in the end are just beautiful uh clean coding practices are better than writing highly optimized complicated code because somebody else is optimizing it and the whole server gc versus desktop gc uh it's a great piece of back of background information really yeah pretty much appreciate that and i just want to add too i know i was getting down on uh people trying to do the open source monitoring i'll add to that though by saying if none of this conversation has deterred you from wanting to tackle open source monitoring, come work for us instead. You'll get paid to do it properly. Because there's a lot of great stuff going on with these things and you get to do a lot

Starting point is 00:58:40 of fun experimentation and playing with stuff. So yeah, always check. We're always hiring in different areas. So check out our job opportunity side. Georg and Bernard, thank you so much for coming on today. Really appreciate you taking the time, especially with the time zone differences here. And good luck with everything else going on and continue writing awesome code for this awesome clean code for this.

Starting point is 00:59:09 Thank you. I just wanted to say thank you. And it was really nice that we had the opportunity to join you. Yeah. Also, thanks for me for having us and giving us the time to explain some things. And yeah, I hope there will be other very interesting topics coming up soon in the podcast, and I have to look forward for new episodes here. Great. And if people have any questions,

Starting point is 00:59:40 comments, you can send them to us via Twitter at Pure underscore dt do you all Georg and Bernard are you on social media or linkedin that you're looking to put yourself out there at all or do you remain quiet and underground

Starting point is 00:59:57 I'm on linkedin so I'm on linkedin too and I have also a twitter account so I think we should put the information somewhere on LinkedIn. I'm on LinkedIn too and I have also a Twitter account. I think we should put the information somewhere. We'll put that

Starting point is 01:00:09 as well as those links. Andy, maybe we can see if we can find that link to that Microsoft tool if it's still

Starting point is 01:00:16 available for the modules. That would be good to include if we can find that if it's still out there. Thank you

Starting point is 01:00:23 everyone for listening and we'll be back in two weeks. Have a great time and clean coding ahead. Bye-bye.

PurePerformance - Bringing Observability to .NET with Georg Schausberger and Bernhard Ruebl

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.