PurePerformance - 005 Top .NET Performance Problems
Episode Date: June 20, 2016Microsoft is doing a good job in shielding the complexity of what is going on in the CLR from us. Until now Microsoft is taking care to optimize the Garbage Collector and tries to come up with good de...faults when it comes to thread and connection pool sizes. The problem though is that even the best optimizations from Microsoft are not good enough if your application suffers from poor architectural decisions or simply bad coding.Listen to this podcast to learn about the top problems you may suffer in your .NET Application. We have many examples and we discuss how you can do a quick sanity check on your own code to detect bad database access patterns, memory leaks, thread contentions or simply bad code that results in high CPU, synchronization or even crashes!
Transcript
Discussion (0)
It's time for Pure Performance!
Get your stopwatches ready, it's time for Pure Performance with Andy Grabner and Brian Wilson.
Hello and happy Cinco de Mayo everybody, or belated Cinco de Mayo.
As you all know, we like to record these ahead of time, but today is actually Cinco de Mayo.
And my name is Brian Wilson and we have as always Andy Grabner. Hello Andy. Hey, and it's actually one day after May the 4th be with you,
right? It's actually very interesting. That's right.
These two significant days are so close to each other.
It is.
There's a lot of great pictures from
May the 4th.
The funny thing about Cinco de Mayo,
at least in this country, is
are you familiar much about
what Cinco de Mayo is really about?
Well, what I know, I think at least, it's the Mexican National Holiday, right?
I would say National Day, yeah.
Well, it's a holiday, yes.
It's a very safe answer.
And I guess the people in the U.S. may just use it as an excuse to get drunk.
I don't know.
Yeah, that's basically it.
A lot of people also think it's Mexican independence, but it has nothing to do with Mexican independence.
It was a Mexican holiday celebrating the Battle of Puebla on May 5th, 1862.
And yes, I do have Wikipedia open in front of me right now.
It had nothing to do with Mexican independence.
Basically, they beat the French out of a town.
And the reason the United States back then was happy about it is because we were looking at the French as possibly encroaching into the United States from there, from Mexico.
So with Mexico beating the French out, we kind of saw that as a little bit of an extra safety net probably before we took over a lot of Mexican land in what is now the United States. And so basically we just – well, I learned a lot and hopefully our listeners also learned a lot what they didn't even sign up for.
I mean it's good.
Yes.
It's a historical thing.
Yeah.
But the reality is it's just a bunch of people go get drunk.
Yeah.
Well, we have a party here today in the office.
Mexican beers.
Oh, you do?
For a second of Mayo?
Yeah, we do.
We have – am I allowed to say that? I'm sure I am. party here today in the office. Mexican beers. Oh, you do? For a second of my own? Yeah, we do.
Am I allowed to say that? I'm sure I am.
Starting at 3.30, which is about an hour,
we have two margarita machines and we have some Mexican
beer and we also have craft beer,
US craft beer on tap.
That's what I'm saying.
People just take it as an excuse to get
trashed. Yes, especially
at work. I do not have any.
I don't have any Mexican beer.
No?
Well, we can fix that.
Go to a store, buy something.
And it's almost, it won't be served while we're still recording this.
That's true.
Unfortunately, you can't help it.
Enhanced this week's episode.
So this week's episode, what are we talking about?
.NET performance.
Exactly, because I think last week when we were recording, we talked about Java performance.
It might not be a week, but last episode, yes.
Last episode, that's right.
But I mean, it's still fresh in my memory.
We were sitting in Denver in a hotel and recording it.
Now we're half a continent apart.
Yeah, we're talking about.NET performance.
Why.NET performance?
Because we obviously, from a diagnostics perspective, the two major technologies we see is Java and.NET.
And I think there's a lot of myths about Java and.NET performance, especially in.NET,
because there are so many things that can just equally go wrong as with Java.
And I think when we did a little chat before we got started, we also said,
there's also a lot of, let's say, things
you cannot control as a.NET developer.
You think you're not under control as much as in Java because Microsoft tries to make
it very easy for developers to write code and then run the code and then they just take
care of it.
And there's some really interesting things, especially if you run apps in IIS, which I
believe most of the apps that run that are ASP.NET apps, web apps or services run in IIS.
And there's a lot of stuff that Microsoft takes care for you, like automatically recycling the app rules in case you do something stupid and run into a memory leak and things like that.
Yeah, and there are a lot of benefits. I like to knock.NET just because it's, well, that was the main programming language I was working with back when I was running performance testing.
And it always just seemed to have issues.
But it's, you know, I guess it's easy to knock Microsoft.
But there's actually a lot of great things that it does for you.
And a lot of that management really takes some of the guesswork out.
But I guess the difficult part is when you do run into those issues, finding out how to actually take back control over those components. So hopefully,
we'll bring up some interesting ideas for you and help you all point you on the way to how to
address some of these things. What are some, you know, before we dive deep into them, what are some
of the, you know, main issues that you see?
I don't know if we're going to address them all today, but let's get a list because we
are going to keep revisiting this topic until we exhaust them all.
What do you see as quite a lot of the.NET performance problems?
Well, if I look back at the last 12 months, so I think I mentioned this before in some
other episodes, whether we have the Diamond Trace free trial that everybody
can register for and which converts to a personal license.
So in case you don't know, bits.ly slash DTPersonal.
And Brian, do you want to do me the favor and repeat it?
Because my strange Austrian accent might not be that clear.
Yes, but mine might be just equally as unclear to anybody who's not listening to the United
States.
So I'm expanding my worldview and admitting that I do not have the normal accent.
So, yes, bit.ly, B-I-T.L-Y, slash – what was it?
DT Personal.
That's right, DT Personal.
Yeah.
I have a very short memory span.
So people sign up for the free trial and part of the free trial and then the personal license is an offer that I make sending data to me.
It's called Share Your Pure Path and people have shared their Pure Paths with me in the last 12 months.
I got 200 people sharing their Pure Paths with me and a lot of them were.NET Pure Paths or Pure Paths from.NET applications.
And back in early last year, actually,
I was just looking at the blog here.
I wrote a blog on C-sharp performance mistakes,
top problems solved, and I list a couple of them.
And one of the things that I, if I just read from top to bottom,
wrong sizing of available worker threads, that's one thing,
loading too much data from a database, excessive use of database connections,
expensive string concatenations, high memory usage, too many background threads, too many asynchronativity, and just excessive on exceptions.
So these are some of the things that I see.
And it's actually not that much different from Java, I would say, where we suffer all from the same problems, even though we always point the fingers to the, oh, look at the Java people.
They don't know how to write code.
And the Java people say, oh, look at them, strange Microsoft people.
They don't know how to write code. But in the end, we all write applications, and we all suffer from the same problems, either having a bad architecture and coming up with with just too many
communication to it to chatty between either the application the database or
now between services sending too much data over the wire or if you build web
applications just building pages that just totally overloaded and in order to
communicate with from one tier to another we have typically some type of
connection pool we have in an have in a multi-tier environment
and also in a multi-user environment,
obviously we have threads
that are executing all of our requests.
So that's where we need to look at thread pools
and connection pools.
And the other highlights that I had here
is memory leaks,
memory usage, hard garbage collection,
doing stupid things like
very expensive string concatenation so instead of using
string buffers or string builders in.NET just you know building them through string objects
I mean some of the stuff the compiler already optimizes for us but still I've seen very strange
interesting things so there's a lot there's a long list and if people want to read more about it I say go to the
blog.dynatrace.com
and search for
it's APM blog
right
I think it's
APM blog
correct
I think if you
enter blog.dynatrace.com
it will redirect
so you will find
that's the old
URL
exactly
back when we were
Dynatrace
what's wrong with you
so
basically
a long list of things.
And I think we said to keep a single episode, let's say, digestible, we want to focus on two things, right?
At least we try to get through at least two things.
What was that?
Can you remember, May?
The worker managing your worker thread pools.
And we were going to tackle database.
But I think we may even find that the database topic,
you mentioned several database problem patterns there already.
So we'll see what we can get through on that.
Of course, for all of our listeners,
we are going to, once again,
address the most beloved database problem out there,
the N plus one query issue.
But it's interesting that there are several other database issues
in those things. Not just out not not just dot net i would say but you know looking from that
blog that you had written um there are some some more fun ones and i really liked the idea of
leveraging the stored procedure but we'll get to that when we get into the database. Let's start with the worker threads, though.
And why don't you
tell us what you've seen.
So the number one question that I always
get, people that use Dynatrace,
I think most of them actually deploy an app
typical in IIS, because it's
either web services that they have to deploy
or web application, and then they show me
the pure path. And if you know the pure
path, the pure path starts, in this case,is on the native on the native layer of iis and then iis takes the request
that comes in from the browser or from whoever consumes that service and then it's going through
the list of modules the native modules and then eventually it passes it over to esp.net
so and the most common question that i get and people say why does Dynatrace show me a lot of I.O. time in IIS before it's passing it over to ESP.NET?
This doesn't make sense to me.
Why do I see all the time in IIS?
What's happening there?
I ask that a lot myself.
You're really funny today.
Did you already have some margaritas today
no i didn't but i've been i've been having my i'm back on my diet of um vegetable and fruit
shake so maybe without all the extra junk in me uh well maybe a little sharper this this this
green healthy stuff maybe starting to in your in your track, maybe to start to build some alcohol. Who knows?
Fermenting, maybe somehow you get this.
Anyway, let's go back to the topic.
So here's the problem.
If you look at IIS and if you look at ASP.NET,
that means IIS natively is handling the incoming requests and it then needs to forward the request
to the ASP.NET worker thread
or to the ASP.NET worker thread or to the ASP.NET
worker engine. And if you have, let's say, a thousand requests coming in and you only have,
let's say, 50 threads in ASP.NET, well, guess what happens? All these thousand requests cannot
be handled at the same time. The first 40, 50, 60, whatever can be handled. And depending on
how fast you are in your ASP.NET processing, you can handle these requests rather fast.
But if you're slow, that means IIS needs to pull these requests.
It basically blocks the request on the native part before there's a new free worker thread available in ASP.NET to pass it over.
And this is typically when we look at a pure path where we see request comes in in IIS and then it takes a huge amount of time before the request gets passed over to ESP.NET.
What I always like to look at, and this is also the number one finding that I have in my blog that is called C-sharp performance mistakes, I look at the elapsed time column.
So the elapsed time column is a column you can turn on in the pure path tree.
And it shows you when
did this particular request actually hit a particular tier. And you see the elapsed time
column starting with zero on IIS, and then typically takes milliseconds before it gets
passed over to ASP.NET, but it could be seconds. And the first example I have in my blog, it
actually took 87 seconds until IIS was able to pass that request over to the ESP.NET worker thread or worker pool.
And then it's clear, hey, there's something going wrong.
So typically it's either there's too many requests coming in on IIS and not enough threads on ESP.NET.
The other thing that could also happen is that you have some native modules in IIS that are either doing redirects, encryption, authentication, compression, decompression.
If you have a lot of these native modules, it could also be that some of these native modules actually cause slowness.
In this case, I typically recommend to use the failed transaction tracing feature of IIS,
which actually breaks down how a request is
actually moving through the individual native modules and where maybe native module is taking
space.
But typically, I see the problem of too many requests coming in and not enough worker threads
available in ESP.NET.
And the question is, can we increase the worker threads?
I think Microsoft itself, they have like a little rule internally,
depending on the number of cores you have on your machine, on how many threads are actually
by default spawned. But I believe, and Brian, maybe you know that better than I do,
there's an option to actually configure the number of worker threads.
Yeah, I was just looking this up earlier today, because one of the biggest issues I've always
run into when you hit.NET problems that are especially around threads, memory, and everything else,
is the common response is, well,.NET handles all that. That's not something we have settings to do.
Whereas in Java, in the Java world, or should I say in the Java world, you're manually setting
almost all those settings, if not 100% of them. I'm not exactly sure the coverage there.
So the important thing to note, though, and again, your search engine is your friend, is look these things up.
And I just came across today there is a way to override via code in the more recent versions of.NET.
I believe there are two commands.
One would be get max worker threads, and the other one is set max worker threads.
So programmatically, you can handle that.
But again, it then comes down to a code change in order to do this.
So it's not an easy way to fix it, but there are ways to override those.NET settings and controls.
And I believe you were just saying before, earlier, that there's going to be a lot more control possibly coming out in in the later exactly in the next version of.NET exactly yeah there Microsoft is actually
based on what I heard uh is that they are opening up a lot of the the configuration options
and not a big topic even though it's not related to threads now but uh memory management
um right now in in Microsoft world like you have no control about about how large an individual heap space is or the generations.
And I think this is something they want to change, I guess, to learn from the Java world.
But another thing, I mean, this is –
Go ahead.
Yeah.
So what you said is –
There's two things I want to revisit on what you said, but you finished this stuff here.
So what you said, you said there's a way now to override the max worker threads programmatically.
That's great. Another option that you obviously have in a way now to override the max worker threads programmatically. That's great.
Another option that you obviously have in IIS is to use web gardening.
That means in an app pool, you can say, I don't only want to have one instance of an ESP.NET engine running in an app pool, but I potentially want to scale it out.
So if IIS actually detects, hey, I can only feed so many requests to an individual app pool because there's only, let's say, 50 worker threads available.
Then maybe we just launch additional instances, and that's called web gardening.
I believe they still call it web gardening, where you can say, I want to just spawn additional app pools and therefore then distribute the load across more ESP.NET engines that can then handle this load in parallel.
But, I mean, we are a performance company.
So there's two options that you have.
If you have too many requests coming in, it's like if you go to security on an airport, right?
You have 5,000 people trying to get in, and you have one line open from TSA, then you're frustrated.
So what can you do?
You can either create more lines,
which makes a lot of sense, right? Instead of one lane having five, 10 lanes, or you can speed up
the process of individual people going through the security check, like optimizing the time.
And that's just the analogy would be optimizing your ASP.NET code, because if you don't take one
second, but maybe you can optimize your code and it just takes 500 milliseconds to execute something, you basically increase the throughput by 100 worker threads instead of and only doing that and not addressing the performance of the system.
Guess what?
You're using more memory.
You're going to need more computing power.
But I think the memory would be the scariest one because it would be very tempting to just say, oh, increase the increase the thread pool size.
Right. But you have to look at your memory utilization. because it would be very tempting to just say, oh, increase the thread pool size, right?
But you have to look at your memory utilization.
You have to look at the memory availability on the machine.
So it's not just as simple as turn something up.
You always have to look at what the effect of that is.
Yeah, what the cost is.
Yeah.
So I wanted to go back to a couple of things back when you started talking about
worker threads. First one, and I don't mean to put you on the spot here if you're not 100% sure,
but it almost sounds like one of the metrics that might help you with the worker thread exhaustion
would be one that's very commonly monitored in performance testing, which is your request queued
in ASP.NET.
Is that going to be the same metric to show you the same idea
so that if you are seeing a delay in here,
you can cross-check it with request queued and put those two together?
Yeah, so ASP.NET has a great number of performance counters,
and actually one, I probably need to bring it up here,
but there's a request in the queue, yeah.
So request queued is a great metric to watch because if you see that number go up and up and up and up, you know, well, we have an issue here.
Right, and you could also then take a look at how long until that starts impacting the performance.
So when you do see that web server time increasing, a small amount might be okay.
So you can see what sort of overhead you can take, but then you can figure out what your breaking point is on those threads.
Additionally, you mentioned a lot of the modules in IIS, and I think it's interesting.
I love the elapsed time column in Dynatrace.
I usually put that together with execution time and execution, so I can see, you know, I usually have a bundle of those three together. But the fun thing is when, when it's taking time in the web server, it's always, you mentioned a
lot of different kinds of modules and something like compression is going to happen on the way
out. So one thing you can do early as well, when you're looking at that time, using the elapsed
time, if the elapsed time from the request, let's say it takes one second in the IIS web server portion of.NET, but the thread going from the web server portion to the app pool, the elapsed time is very short, then you know it's on the outbound side.
And that can help you figure out where to start concentrating your effort.
So just bringing that up as in, you know, that actually ties into the last thing I wanted to bring up.
You mentioned some other tool within IIS to help you see what's going on in those modules.
I never heard of that before.
Can you expand on that?
Yeah, it's called, hopefully I got it right, Failed Request Tracing, FRT.
If you Google for IIS Failed Request Tracing, it's a tool that you can turn on.
And I think it's typically should only
be turned on for troubleshooting but it's basically some native logging from iis and it shows you for
your request not for all of them for those that actually fail that's why it's called failed request
tracing for those that are failing or very slow it will give you a detailed breakdown on how these
requests are actually handled by the individual native modules. So that's one thing.
And what I heard, I'm not sure if I'm allowed to say that.
I think I'll just say it because who cares.
Our engineering team, obviously, in Linz, Austria, is aware of the fact that this is
a level of visibility that a lot of people would love.
And the engineers on our web server agent, on IS agent, have actually figured out a way
to get this information now for every single request that DynamoTrace is tracing for the pure paths.
So it seems we're getting something soon in the upcoming DynamoTrace.
That would be awesome.
That would be very awesome, yeah, because this is a big blind spot for many.
And I love what you said before you said you're looking at the elapsed time and depending on if the gap is between
iis and asp.net or but then on the way back you know is it incoming request or outgoing what i
then also always look at is the request and response size so in dynatrace you can actually
capture request and response sizes both on iis and also on the aspP.NET engine. So you can actually see, hey, this is a two kilobyte package.
It's probably not big.
Oh, it's a two megabyte package, and it has to go through compression and blah, blah, blah.
It also gives you a little more information on where time might be spent here.
So that's a great one.
What I also like, maybe another one to mention, unless, Brian, you want to say something?
No, no, no.
I was just getting ready for something else.
Okay.
One last thing.
We told the audience to look at the elapsed time column, which is great.
I also always enable the thread name column in the PurePath tree that shows you actually which native and also which ESP.NET thread is actually involved in executing these things.
The reason why this is very interesting, if you have not only native modules but also managed modules in IS, I think which are more and more common now, you can actually see the sequence of modules being executed and which threads are involved and actually how many different threads are involved to process a certain request.
That's also very interesting.
And it becomes even more interesting if your code in the backend or in these modules is
then spawning additional background threads because they want to do something asynchronously.
So you're basically binding a lot of threads, doing something and waiting maybe on something
that should actually be, this thread
should be freed up for new incoming requests. So coming back to what you said, it's not only about
adding more hardware on the problem and spawning up more instances, it's about optimizing the
execution, the code that you have right now. And that just gives you all the visibility.
That's pretty cool. Right. But that's always always the hard part right that that's when you start getting into possible re-architecture but that's what's needed to
address these problems properly yeah right so it's people might complain when they see
what they're up against but the the trick is seeing what you're up against having the tools
to see that um and then getting it done Because again, hardware is only part of the problem.
And obviously sometimes hardware is something that needs to be added.
Having to add hardware can sometimes be a good thing, right?
Because it could mean your business is so successful,
your code is optimized, man, you just have so many people,
you need to have more hardware.
But you should always be looking into that code optimization first
because otherwise you're just going to shoot yourself.
You're kicking the can down the road.
Yeah, exactly.
And especially if more and more people are moving to cloud infrastructure, on the Microsoft world it might be Azure, then it's very easy to scale, right?
I mean, they do the scaling for you.
They just add more and more servers in the virtual land.
And add more charges to your account.
That's it.
And at the end of the month, you figure out, wow, why do we have to pay 50% more if we
only have 10% more users on the system?
Oh, because a lot of the latest code deployment had a little secure resource consumption problem.
We're now consuming that many more CPU cycles or we're blocking all of these threads. That's why
IIS is spawning up all these worker threads.
And yeah, so gotta be
careful with that.
Okay, well.
What is that?
That means
it's time for the
pure performance
trivia no prize.
Oh, my God.
Question.
Most people don't even – unfortunately, most people, well, nobody except me can actually see you because I see you.
I can see you in my Skype video chat.
That's amazing.
Anyhow, yeah, it is.
It's a little Korg monotron.
Anyway, yeah, so I think we're done with the worker threads.
It sounds like we are.
I didn't mean to cut you off.
No, I just wanted to maybe sum it up.
Oh, yes.
So that was actually our summary.
That was our summary, summing it up music.
Yeah, perfect.
Okay.
So from my perspective, what I encourage everybody to do is look at how many requests are coming in on IIS, how many threads are active on your ASP.NET engine. If you use Dynatrace, look at the elapsed time column, look at the thread name column, so you can actually figure out on your critical transactions, how many threads are actually involved and where is the time do you have a gap
between IIS and ASP.NET and you know IIS has to wait to handle the request
over hand the request over to ASP.NET if the the gap is after ASP.NET you
typically it's some of these native modules that need to handle a lot of
bytes that are coming back to maybe call to compression and stuff like that yeah
so that's something that you want to that you want to look out for.
And I think this is – anything else that you want to add?
Nope, not to that one there.
Maybe check out the blog.
If you look at the –
Yeah, it's a good idea.
What's the name of the blog?
It's called C-Sharp Performance Mistakes.
And the full title is C sharp performance mistakes top problem
solved in December and
that was basically a blog from January
14th 2015
and it's on our
apmblog.dynatrace.com
excellent
alright so
now we get to
the trivia question I would love
to announce who won last week's question.
But as we said before, this is recorded.
This is actually this, you know, this is the last of and I'm giving a little inside information here.
But we are in launching this podcast.
We're recording several episodes before we launch.
And this is the last of our goal.
This is our fifth episode.
And now our plan is to launch it. So we'll be a lot more closer to live once this one airs.
Anyhow, so I can't announce the winner of last week's because it hasn't aired yet. We're still
back in the future. And but that will be posted up on the website. In the meantime, though, we have our new question.
This time it's going to be from Andy.
And Andy, I'll hand it over to you for this.
So my question is, I mean, we discussed three questions with three options.
And what I would like –
We only do one question.
I know.
Because there can only be one winner.
I know.
Oh, okay.
I thought you said you were going to do three questions.
But the two of us, we discussed three options of questions that I ask.
And I think I want to go with the second one that we discussed so my trivia question is what was the first programming language that i picked up and with it's very it's it's not as
simple as it's c or something so it has to be very specific also on the platform. So what was the first programming language
and on which platform did I implement
that
program?
So we can already exclude C
from that based on... We can exclude
C and the... But we won't
exclude anything else. But approximately
what your...
I think maybe your first programming language
might have been Ruby, right? You're 12 years know, I think maybe your first programming language might have been Ruby,
right? You're 12 years old, I think.
Yes. No, I was, I think what you, to make it easier, not easier, a little tip. If you
go to my SlideShare account, SlideShare.
And search through all of your slides.
Yeah. And some of them, I, in some of my presentations, I give a little intro to myself, to my person.
And I actually covered my first computer that I ever had.
And it was back in the late 80s where I got this computer from my parents as a Christmas present.
And it was in the beginning, beginning obviously it was only about games
but then soon i realized there's some cool stuff on that thing as well and yeah so go to slideshare
grabner andy slideshare.com or slideshare.com slash grabner andy and then you will find maybe
some of the slides that gives you a hint of what my first computer was and based on that you just
figure out what the most popular, very easy to use
programming language was.
Right. And when you
think you have an answer,
tweet it to
dynatrace, hashtag pure performance,
hashtag no prize,
K-N-O-W-P-R-I-Z-E,
and if you are
the first person with the correct
answer, your name will appear on the site saying you won.
Isn't that exciting?
It's very exciting.
I think it's exciting.
And if we ever meet you in person, we will also give you the privilege of buying us a beer.
Of them buying us a beer, right?
Because we can't give out prizes.
Well, no, we can't give out prizes.
That's true.
Real ones. It'd be more fun if they win, they buy us one. of them buying us a beer, right? Well, we can't give out prizes. Well, no, we can't give out prizes. That's true.
Real ones.
It'd be more fun if they win, they buy us one.
Yeah, that's more fun for us, definitely.
Anyhow, moving on then.
So hopefully good luck on that.
And again, we're trying not to make them super easy Google answers.
And we also, I was thinking about this in between too,
don't flood it with a million answers possible, you know, because I'm sure there's going to be somebody out there
who's just going to start tweeting every single programming language
on every platform possible.
So please don't do that.
That's cheating.
If nobody wins, then nobody wins.
And at some point we'll reveal these answers.
It will all be clear one day when Skynet takes over.
Anyway, okay, so let's talk about the.NET.
Speaking of Skynet, let's talk about.NET performance problems going into database.
Yep.
Right?
And I think one of our – do we want to tackle the elephant in the room first,
or should we talk about something else database first?
I think I want to – I mean, the elephant in the room, you should we talk about something else database first i think i want to i mean the elephant in the room you mean the n plus one query problem yeah what are you
talking about well i think we should just talk about that started in general because maybe
we'll figure out which areas we can cover uh but we i mean in the end it's going to be the
n plus one query problem because that's the number one thing we see out there right that's just the
thing well let's talk about in general some of them right so there's the N plus one query problem because that's the number one thing we see out there, right? That's just the thing.
Well, let's talk about in general some of them, right?
So there's obviously N plus one,
which you've heard us talk about before.
Yeah, but I think the bigger problem is
so the applications that Brian, you and I looked at,
we always see, I think the trend is
developers typically don't write their SQL statements
on their own anymore
because most often it is hidden through a framework, right?
What are the most popular frameworks out there?
Hibernate?
Hibernate, yeah.
Well, that's not...
No, and Hibernate.
And Hibernate is perfect on.NET.
Then, you know, we are using the AD.NET Entity Framework
to query entity objects from the database.
We can use link as the query language that then gets translated into some SQL that then is executed through AD.NET.
But as a developer, typically I build my object model and then I develop against that object model.
And that means that I often don't even know what's going to be executed.
And this is actually, I think,
one of the reasons why we see so many crazy things.
And I'm just looking at the same blog post
that I mentioned before,
C-sharp performance mistakes,
top problem solved in December.
And the number one finding was 600 SQL executions each
on a separate DB connection.
That's even more interesting
and more of a good thing.
More awesome than N plus one?
Yeah.
Well, it's N plus one, but it's every single SQL statement on its own connection, which
doesn't make a whole lot of sense.
But I do want to point out, it doesn't always have to be N plus one.
I mean, as far as 600 transactions doesn't have to always be N plus one.
It could just be some really crazy query structure or, you know, I mean, you could just be, I don't know what, whatever.
I don't know what I'm talking about.
Well, you know.
You take over here.
I'm just saying it's quite often N plus one is the issue behind those large volumes.
But every once in a while, you'll get to something where people are just making ridiculous amounts of database queries against something, and it doesn't fit that N plus one pattern.
Yeah.
But it's just a ton of –
And an example would be if people are generating a report or they're trying to get all the data they need for a very complex page. And then they are basically looping through different items
and then are grabbing something here,
grabbing something from here and here and here.
And in the end, as you said,
you end up with hundreds and even thousands of SQL queries
that in this case, I think it would make a lot of sense to say,
well, you're just implementing business logic in your application,
but you're actually it's very
data specific so why not move it to the database and this is something where you would think about
putting this stuff into a store procedure right right and and this is the with the let's talk
about these uh these connections right because the big problem at least you know that we that
you looked at in that one blog is not only only are there 6,000, or not 6,000, I have seen queries where there are times when there are 6,000.
But in this case, I believe it was 600, but they're each grabbing their own connection.
And there's no connection sharing.
So what's going on in that situation?
Explain that a little. Yeah, what I see is it's typically because the developers are using frameworks
and then they are iterating through object lists.
So they have a loop
and they're iterating through objects.
And every time they iterate through an object,
they basically say, hey, in Hibernate
or hey, it'll not enter the framework.
Give me the next object
and now give me the next object
and give me the next option.
The way Hibernate implements it in a smart way, they say, well, if you need some piece of information, I will grab the connection, execute the statement and release the connection again.
So it makes sense because Hibernate doesn't know that you will ask me for the next 599 objects.
So it is every time you call it, getting a connection, executing it and releasing the connection again. So actually following one of the best practices out there of only holding on
to a connection as long as you really need it. But what's wrong with this is that as a developer,
if I know I need a certain set of records and know I need the 600, then you can use these
frameworks, whether it's Hibernate or whether it's Entity Framework or anything else you use.
You can say, I know I need it, so give it all to me in one batch.
That means opening one connection, execute one SQL statement that returns more than just one record, and then closing the connection again.
So this is typically how it ends up like this.
And do these frameworks usually give you that option to reuse?
And so to the listeners, he's shaking his head yes.
So you do have the ability to reuse.
But if you do set that setting, and maybe I'm getting a little bit too detailed here, but if you do set that setting, is that going to say reuse it – well, is that going to be like a global thing and can setting that reuse be – cause a problem later on with things that might not have such a large volume?
Or do you have that kind of finessed control over it to say in certain cases use that, in certain cases don't use that?
So I shouldn't have shaked my head because I should have listened to the full length of your question because what I was really meaning because you said you know you have the influence in reusing connections
what I really meant is instead of going through a loop and asking the framework 500 times in a row
give me the next element the next and the next one the option that I actually prefer is if you
know that you need all of these 100 500 600, 600 objects in this case, why not say, hey, Hibernate, hey, Entity Framework, please, in one batch, query all this data, give it to me then in a list, basically an in-memory representation of these objects, and then I iterate over it.
There would be one option if you already know all of this. If you don't know up front how
many objects you really need, then the option that you should have in these frameworks, depending on
the framework you're using, you as the developer, you are taking a connection out of the pool and
then you're passing that connection into the framework itself and say, please reuse that
connection. And I may call you multiple times, but reuse it on that connection. And when I'm done, I am responsible for returning that connection to the pool again.
So with that, I'm not giving it to – I'm taking over responsibility and control of the connection.
That's kind of my thought.
Excellent.
And the other option then as well is you were talking about turning something into a stored procedure instead, right?
That's always a – and that could be a much better option in general.
Yeah, exactly.
Obviously everything –
And I think we are just shying away from that because as an application developer,
if you don't have the background in SQL but you know how to use these frameworks,
you may not even want to write SQL.
You don't want to write stored procedures.
You may not even know about it.
And therefore, it's just more convenient to use these frameworks.
But sometimes you should sit down with your DBA, so with people that have more experience
in database queries, and then say, is there a more efficient way so that I can get exactly
the data that I need instead of me going a thousand times to the database?
Is there an efficient way where we can
write something, a stored procedure, and then a
stored procedure in the end gives me the results that I really need?
And so that will
help a lot, I think.
And it kind of sounds a lot, you know,
we talk a lot about the performance testers
leveling up and learning different things.
It almost seems like
these frameworks put developers
in a very similar situation where,
you know, even with the N plus one query, where if you just use something out of the box as it is,
you could have all these different problems and you really need to, even as developer level up
and understand what your framework is doing, why it's doing it. And, and, and figure out if that,
in the end, if that is the better way to do it. Obviously, there's a lot of convenience that a framework is going to offer,
and it really comes down to what does it give you
and what are the dangers and going ahead and learning about what that is
and seeing what it's doing in the background.
So when you have your APM tool of choice in place, you can see what it's doing.
You can see that those 600 connections
being added and get yourself to start asking questions. And the reason why I think we ended up
in this situation where we don't look behind the scenes right away is I was just at a meetup
yesterday here in Boston and great presentation from a guy from Couchbase. He was showing that
database and basically what he did, he presented how it works.
And then in the workshop, we were all downloading a sample app. And we're deploying that sample app,
and it worked. And it was amazing how easy it was. But I guess if you are jumping on the next
framework and the next framework, and you just learn by sample apps, then it's very easy to fall into that laziness of saying, well, I just take the sample app and the code
and the way they showed it to me.
And then I just implement my app in the same way.
And I don't need to.
It's also fast, right?
I mean, I don't need to worry about optimization.
And then later down the line, you realize, wow, you have a performance and a scalability
issue, even though you're now operating on a NoSQL database.
But you're just using it the way a sample app used it, a travel app.
Or actually, in this case, yesterday, there was a travel app.
You know, it's – so really –
It was a travel app?
It was a travel.
It was like –
Everybody loves a travel app.
Yeah, I know.
It's amazing.
In this case, it was a flight.
It was like, show me the flights from A to B.
And it was a great example. But what I'm saying is, and what you also said before, sit down, use tools like Dynatrace or your APM tool of choice or your tracing tool of choice,
and figure out what actually happens when you are implementing your key features.
And if you are making the same round trip 500 to 600 times,
it's probably not a good thing.
And there's always to optimize.
And most of these frameworks have a lot of features.
But these features are typically not shown in the Hello World example.
Right.
That's what it is.
And, yeah.
So too many database queries on individual connections is a bad thing.
Another thing that I see a lot is just querying too much data.
So very executing SQL statements that return too much data that nobody really needs.
So like give me everything and I figure out later if what I really need.
That's kind of the other extreme, right?
Give me everything, which is not good and another thing that i see a lot not in the what i see a lot we
typically talk about how many database statements are executed in a single transaction but if you
look at the lifetime of your app or if you look at the transactions that come in in an hour and
if you see the same set of select statements
being executed all the time,
like give me the product information of this product,
give me the price of that product,
give me this and this static information
or kind of static,
then what do we do with this?
What's the option?
Cache.
Cache, exactly.
I mean, you need to figure out a way. We have a lot of
cool frameworks out there that allow you to
distribute the caching that
work extremely well, also if there are updates.
So try to cache the data
closer to your end users and not
always go to the database if the data
doesn't change that much.
Right, and that almost sounds like
something, some
advice that would be given out 10 years ago, right?
Because these in-memory caches have been around for quite a long time, and people have really been doing this for, again, 10, I don't know how long, quite a while now.
But you still run across this stuff all the time and especially once again if you start using a framework and from a development point of view you're not paying attention to these
these components this is where your tools are going to help you see that information and be
able to be a hero and say hey how come we're making a call for again store locations well
i guess store locations isn't too commonly used, but you get the point. Something that's very commonly used over and over again.
Why isn't that being cached?
I do want to talk a little bit about N plus one, right, because we've been referencing it, but it's been a while.
It's been a few episodes since we've talked about it.
And in case people haven't heard us from the beginning, N plus one is probably one of the most common database issues that we see all the time. And
this again is very, very visible when you're looking at a single transaction,
and you see the same query being run multiple times. Usually it's going to consume a large
portion of your time, but even if it doesn't consume a large portion of your time, it's still
inefficient. It might be leveraging connections, uh, could be holding up threads,
a lot of other things, right. But you're going to see large number of times of the same query
being made oftentimes. And in most cases can be with the same bind variables, right? Um, so it's,
it's a very, very problematic thing. And I think a good way to describe what an N plus one query
is to kind of simplify it, make it, you know, N plus one for, I don't want to say for dummies, cause that would include me. But, uh, you know, if, if, if you said, well, you know, a good example
that might happen in code, right. Where you say, give me a list of all of the student IDs,
right. Pull back all the IDs for all the students. Now you have that list. And then in a separate query or, you know, separate execution of code to a query. I think a better option would have been get me a list of student IDs and their hair color,
you know, cross joined with the hair color table.
So you end up running a whole bunch of extra extraneous queries, throwing on a lot of extra
time because these queries might only take 10 milliseconds, right?
It might be a really fast query, but in total on that entire transaction,
this could take half a second, one second.
You know, we've seen some, some crazy examples of it.
And it's probably one of the most common issues out there.
You see it all the time.
You are a mouthpiece against it.
So let me, let me give you a chance to speak on that a little bit too,
especially in context of this.NET stuff.
But I don't think there's really
a Java or.NET context
to N plus one queries, is there?
No, it's not.
It's basically, as you said,
I think you explained pretty well
that it's just an access pattern,
a bad access pattern.
It's not only,
I wouldn't even call it a problem,
it's an access pattern,
which could become a problem, it's an access pattern which becomes,
could become a problem.
And the problem is
that it's data-driven.
That means the more results you have,
the more often you go to the database
with these round trips.
And I know we always call it
the M plus one query problem,
but actually it should be called
the other way around.
It should be called one plus N.
Because what you said,
you start with one SQL query, say, give me the list of student IDs.
And then in a loop, I go to every single student ID and then make a query to get the hair color.
So if you have five students in the database, who cares?
If you have 10, who cares?
If you have 10,000, you should care.
But you should actually already figure this out earlier.
And that's why, I mean, we talked about this when we did our load testing webinar or episode, when we talked about good data, right?
We have to have good sample data or test data.
And this is a clearly easy-to-identify data-driven problem. If you are testing a feature
and you're testing it against a small sample database
and then a larger production kind of like database,
and you see the number of queries going up
with the number of rows in a table,
then you know you have a data-driven performance problem
and it's easily identified by looking at this
N plus one query access pattern saying is the same SQL
statement executed more than once in a transaction and does this change over time the more data we
have in the database and and for testers I typically say if you are a tester and if you're
testing a feature that allows you to give input data like like in your case the search right
you're searching for people with with the color
with the hair color gray or with the hair color um i wouldn't say orange but i'm not sure who has
orange hair well i guess they consider that red hair but it's really orange yeah so basically what
i wanted to say clowns clown carrot top of course carrot top so so basically what i'm saying is if
you're a tester and you you write the for searching per hair color, you should execute the same test with different colors and then figure out, is there a correlation between the number of results and the number of database queries you're executing?
Because if there is, it's a classical M plus one query access pattern problem.
It's a data driven problem.
Right.
And I want to point out that everybody is susceptible to this problem. It's a data-driven problem. Right. And I want to point out that everybody
is susceptible
to this problem.
Yeah.
And I bring this up
because I recall
from a presentation
you did in the past
that when you started
the whole kind of free trial
and share my pure path
and all this,
you all had
this exact problem
on your data bank.
Correct?
Yeah.
On the backend system we use.
So we store every little, every person.
We keep record, obviously, who registered and what they're doing.
And I have a report that I can pull and I can say, show me all the people that I've
invited.
And then we had the classical M plus one query problem using Hibernate.
And we were the more people
that ended up in our database because they registered for the free trial uh the more queries
were executed when i hit that report and the slower it became yeah that's actually a perfect
point yeah i had this and i want to point out that you did not write that code though i did not
write it but i would have done it the same way i would have done it the same way yeah you dropped
in hibernate you think that's going to take care of it.
And suddenly you find out you're susceptible too.
So it's nothing if you do create a one plus N query problem.
Don't be shy about it.
It happens to everybody.
It's okay.
Exactly.
Right.
Yeah. And I think there's probably more database-related access patterns that we should watch out for.
Actually, like four or six weeks ago, my colleague Harald Zeipelhofer and I, we had an article on InfoQ, and we talked about the different database access performance problems.
So that might actually be another interesting one.
It's called, it says,
Diagnosing Common Java Database Performance Hotspots
because we focused on Java.
But as we said in the beginning,
it is the same for.NET as well.
And we talked about exactly these problem patterns.
We talked about prepared versus unprepared statements.
We talked about connection pool exhaustion.
So there's a lot of more stuff.
But I think, obviously, we could talk stuff. But I think obviously we could talk forever.
But I think for this episode,
I think we hit on the most common ones, I would say.
Right.
And we will pick more of these up in the future.
Do you want to,
anything you want to summarize about the database
or was that your summary right there?
Yeah, the summary is don't think you
don't think you always know
what's really going on
especially if you go
through frameworks.
Don't think
you should not know about it
because you have to.
So I think you should
if every time
when you pick up a new framework
a new language
or anything new
as a good developer
as a good architect
it should be in your best interest
to understand what
these frameworks are actually doing and go beyond
the Hello World example and beyond the
travel apps
example. That's just very important
to understand what are your use cases
and how can you
in an optimal way query
the data that you need without going too
crazy on the database.
Yes, and I would like to add to that as well.
I think your advice is really good for the development teams who are working on these.
And for the performance teams, I would like to add to really start looking at the data
so you start becoming familiar with what these patterns look like.
There are some great examples on the blogs that Andy has written. So you can see the visual along visual along with it. But yeah, fine. Get familiar with these because
these are going to probably be some of the most common things you're going to see over and over
again as you're running those tests. And the more, you know, the smarter you can be, the more
part of that broad team you can, uh, you know know you can be a great contributor and really show people
that you know what you're doing and the most important thing is just really helping get that
code out there so there's a lot for everybody to learn about this there are a lot of metrics
as we've covered over and over to help you see these things um so keep your eyes sharp and keep
learning for everybody developers too yeah right exactly, exactly. Everybody has to learn.
Yep.
All right. So if you have any other questions, suggestions, show ideas or anything, feel free to send
us an email at pureperformanceatdynatrace.com.
You can also reach us through the Dynatrace Twitter, which is just at Twitter.
I would add the hashtag pureperformance.
If you wanted to follow us or communicate with us directly,
my Twitter handle is EmperorWilson, E-M-P-E-R-O-R-W-I-L-S-O-N,
because I am the emperor in my head.
And GrabnerAndy, that's G-R-A-B-N-E-R-A-N-D-I.
That's Andy's Twitter.
I could have let you say that, but you can say it for her.
Please repeat that for my funny accent.
So my Twitter handle is at GrabnerAndy, G-R-A-B-N-E-R-A-N-D-I.
A very particular I on the end and not Andy with a Y.
And, yeah, I think we really – I'm really looking forward to some feedback on other topics that we should cover.
Also, very happy to take people on the show, right?
We want to interview people.
That would be awesome.
We already have some cool names in here. came in over the last couple of weeks when once we started um kind of advertising the stuff that
we're doing with our readership and with our followers that hopefully will be followers but
yeah a lot of cool stuff you know i was talking to mark a little bit about the early days of
perf bites that's mark tomlinson um perf bites is another performance blog he he inspired us to
start this one up but he is his original idea the show. Have you ever heard the show car talk? Yeah. It was on NPR. It was those two guys from
Boston. So he was telling me that was their original plan for the show to hopefully have
some kind of call in thing. Um, obviously that doesn't work too well in the podcasting thing
and trying to do things live, but it, it does bring up to the point of, you know, I see where
he wanted to go. It would be great to be able to have, um, any of our listeners be able to contribute to the show, to be able to come in and tell us about some of the,
uh, uh, great things they've seen and or solved or crazy problems they've encountered.
So do, as Andy said, please contact us. If you have anything you want to add to the show,
if you want to be a guest on the show and talk about things, we'd love to have you on
just wouldn't be live. And we don't have the funny accents,
well, different funny accents we have i guess but not the boston accent
we love i was actually i was in i was in washington yesterday at a conference and presented
and i started out and i said where do you think i'm from and the first guy said you have a boston
i said really because i i guess i i said i guess I don't know. I picked a word.
You said car?
Idea.
Car or idea.
I think I said idea with the R in the end where there's no R in idea, as we know.
But he picked it up.
I said, well.
That's very astute, picking up the Boston accent underneath the Austrian accent.
Exactly.
That's quite complex.
Anyhow, thank you, everybody.
We'll wrap up today's show big thanks to everyone for listening as always
thank you
talk to you soon
happy Cinco de Mayo
belated as we said
yes happy Cinco de Mayo
hope you were safe
and if not you might not be listening to this
hopefully you sobered up until now
goodbye everyone bye listening to this. Hopefully you sobered up until now. Goodbye, everyone.
Bye.