PurePerformance - 004 Top Java Performance Problems
Episode Date: June 6, 2016The Java Runtime has become so fast that it shouldn’t be the first one to blame when looking at performance problems. We agree: the runtime is great, JIT and garbage collection are amazing. But bad ...code on a fast runtime is still bad code. And it is not only your code but the 80-90% of code that you do not control such as Hibernate, Spring, App-Server specific implementations or the Java Core Libraries.Listen to this podcast to learn about the top Java Performance Problems we have seen in the last months. Learn how to detect bad database access patterns, memory leaks, thread contentions and – well – simply bad code resulting in high CPU utilization, synchronization issues or even crashes!
Transcript
Discussion (0)
It's time for Pure Performance!
Get your stopwatches ready, it's time for Pure Performance with Andy Grabner and Brian Wilson.
Hello and welcome to another episode of Pure Performance.
My name is Brian Wilson and with me as always I have Andy Grabner.
Hello Andy, how are you doing today?
Hey Brian, I'm pretty good. I'm just a little shocked about the snowflakes out there. And it is end of April. I think it's just stopped.
But we're actually recording out of Denver, Colorado, right?
Your hometown.
Downtown Denver.
We're recording live in Denver, Colorado.
That's amazing.
We're always recording live, though, aren't we?
Of course.
Yeah, we're at a hotel in downtown Denver.
Andy's in town for, what are you in town for?
I know this is going to be past already when our loyal followers listen to this.
But just in case they're interested in what the topic is that you're going to be addressing tonight,
what is it that you're doing here in Denver?
Well, besides salsa dancing, what I did last night, I actually go to a meetup tonight.
I'm presenting at the Pivotal Cloud Foundry meetup, and I'm talking about metrics-driven continuous delivery. So basically baking performance metrics into the pipeline
so that people that actually deliver software
and run it on Pivotal Cloud Foundry
don't just run their apps
and they're totally getting crazy with performance problems.
So yeah, it's about performance, but metrics-driven.
That's what I'm here for.
Yes, you're always about performance.
So Cloud Foundry is really cool stuff. I'm just starting to get into it a little bit too, but unfortunately
I cannot be there. It is my daughter's fifth birthday today, so happy birthday, Vivian.
Happy birthday. Yay. So today we have a fun show. It's probably going to be one in a series of shows
about top Java performance problems and problem patterns.
Did I say that right?
You had a better title, I believe, didn't you?
I think, well, top performance problems in Java or something like that.
And as you said, probably going to be a serious because there's just so many of these problem
patterns.
And we've been talking about them for years and years.
And that's also why our customers, I would say, and the users of Dynatrace, you know,
they keep bringing us these examples of extremely bad
problems that happen in the Java Enterprise
apps. Right, and
so we're going to be covering, we're going to attempt
to cover CPU hotspots,
some memory patterns
and problems, and if we have
time, we will get on to
the top database problems.
You might have heard or seen some
stuff about the database, but it's a very important topic and one that we definitely want to tackle. If we don't
get the database today, though, we will definitely get into it in the next one. And there's plenty
more that we're going to cover in future Java performance problem episodes. But before we
start, well, one thing, people can contact us at several ways you can contact us.
You can send an email to pureperformance at dynatrace.com.
That's P-U-R-E-P-E-R-F-O-R-M-A-N-C-E at dynatrace.com.
Also, my Twitter handle is Emperor Wilson, so E-M-P-E-R-O-R-W-I-L-S-O-N.
And Andy, what's your Twitter?
My Twitter handle is GrabnerAndy. So it's G-R-A-B-N-E-R-A-N-D-I.
Yes, not Y-A-N-D-I.
Exactly. That's what I mean. And any other ways? I think people can probably tweet and
should follow the Dynatrace handle.
That one too, yes. So it's just at Dynatrace, D-Y-N-A-T-R-A-C-E. Exactly. And I think
the Twitter handle for Pure Performance is taken, but we could obviously use a hashtag. So feel free
to use the hashtag, hash Pure Performance, if you have any questions, and we'll follow that too.
Right. And probably the best way to do general would be tweeting over to Dynatrace, hashtag
Pure Performance. And that's it.
So anything else you wanted to bring up before we start?
No, I think let's get started.
Oh, yeah.
Actually, thank you.
It's awesome that we sit in the same room and you keep reminding me about stuff.
And that's actually the segue over to getting started.
A lot of the problems that we see and the reason why we can talk about it
and write about it and speak smart about it
is because a lot of our people out there use Dynatrace, but not only the paid version of Dynatrace, but also the free trial.
So I'm promoting the free trial because I run the free trial program.
So if you ever want to test the stuff out that we are doing and talking about,
just go online and search for the Dynatrace free trial. Register for it.
You get a 30-day free trial.
And then the cool thing is it converts over to a personal license,
which means you can keep using Dynatrace full product, full feature on the apps that run on your local machine.
So perfect for developers to do the sanity checks.
Perfect for testers, I would say, as well.
If you're testing some apps on a machine, install Dynatrace on it, and then you get all these cool things.
And you can actually identify.
And now there's Segway over, too.
And Segway is actually fun because my first company I worked for was also called Segway Software.
So let's Segway over to the Dynatrace free trial and personal license. Most problems that I've been talking about and writing about
actually came in from people that leveraged the fact that I have a program
that is called Share Your Pure Path,
so people can use Dynatrace, export the data, and then send it over to me.
And, Brian, believe it or not, Java is the number one technology, I think,
that we still see out there.
That's also why I guess we picked, at least from our customer perspective, right?
They write a lot of apps.
Right.
And I got a total of 200 people last 12 months that sent me that data.
How many did you say?
200.
Okay.
I thought you said 1,200.
1,200 would be a little excessive.
Well, I would love to.
But 200 people sent me that data in the last 12 months.
And I analyzed them.
And now I think I want to talk about what we saw on the Java side.
Right.
So we'll start with the CPU hotspots.
And for everyone out there, obviously, a CPU hotspot is a hotspot in the CPU.
But more importantly, there are many different kinds
of ways you can have cpu cpu hotspots right there are just cpu general cpu consumption
uh we can be talking about what synchronization issues and some weight issues and well you know
why don't you go into starting with uh some of your favorites i guess we'll let you uh
yeah so you see a lot more of this than i do so the first thing i wanted to say a lot of people
say well for cpu hotspot I can just use my profiler.
Why do I need another tool?
And I agree with them.
And I say, if you have a single-tier app and you're a developer and you know how to use your profiling tools, then it's great.
But A, those people that may analyze performance don't have a profiler.
They don't have the IDE where they can do all that stuff.
So that's why they are going to tools like Dynatrace or other APM tools. And also if
you have a distributed app, if you have one tier calling another tier, if you think about
a microservice architecture or a service-oriented architecture, then you need to actually trace
CPU hotspot from one tier to another tier. And that's why it's so nice to have more professional
tools available. So when we talk about
CPU hotspots, I typically, what I do
when I get, when people share data with me, I
open up the response time hotspot dashboard in
Dynatrace. And as the name it says,
it gives me the hotspots of the response
time and it actually breaks it down into what you said,
Brian. The CPU,
like which methods actually consume CPU.
It also shows me
synchronization time.
It shows me wait time.
So sync would be if code is actually syncing on each other. If you have sync blocks or sync methods,
and they have to wait for each other to actually enter that block,
wait would be waiting on an object.
And the fourth one would be IL, which is also very interesting.
Right.
And when you say waiting on an object you're talking about a memory object
or a because you know the only reason i'm asking is that you you said you use the word weight both
with sync and weight right so let's just clarify that for everybody um because sometimes especially
when you're just looking at them initially they could be a little confusing why is one weight and
why is one sync?
That's a very good question.
So basically, synchronization is just, I would say,
a language feature or a runtime feature
where you can say, I have a piece of code
that only one can enter at a particular point in time
because it's handling, it's dealing with some
scared resources or shared resources,
and only one can have access to it.
Waiting on an object is sometimes used for the same thing.
So you can actually say, hey, I have an object, a Java object, and I'm waiting on it until anybody else notifies
it. So it also can be used for synchronization. So I'm not able to pass that line of code until
somebody notifies that object so that I can get past it. But it can also be used for other
notification mechanisms. But typically, they're very closely related, I would say.
Yeah, yeah, absolutely.
And would you also, I know this one we're going to cover on the memory side,
but would you say garbage collection can also be considered a CPU problem?
Oh, yeah. It kind of straddles both worlds, right?
Yeah, it does, because if you are too aggressive with memory allocation
and the garbage collector needs to kick in all the time,
that obviously means that you have higher cpu usage in dynatrace we actually also show
and thanks for that yeah this is kind of a shameless plug because it's one of my favorite
features of our tool and i don't mean to be plugging our tool but heck we work for them
but it's yeah i've with gc when you're running g, you're actually consuming CPU cycles to do that, right?
And if you're not being able to see that specifically as GC from that kind of a breakdown, you might think it's just generic CPU consumption. same time there was some gc activity and there was high cpu here so we can assume it was uh the
the cpu utilization was on that gc but if your tool can actually show you this was actually gc
and identify it as that then that really just kind of breaks it out into a much clearer and
easier to diagnose problem obviously it's not an easier to diagnose problem because then you're
dealing with a memory issue and those kind of always get really sticky fast.
Let's quickly cycle back to CPU hotspot.
So I always go into the response time hotspot.
I see in which layer of the app we actually have real CPU consumption,
meaning methods are just cramping in the CPU.
What I typically see is it's either methods
that really have a bad algorithm implemented, strange loops, you know, like doing too much work.
That's one thing.
So you can actually drill down to the method level, figuring out which methods needs the CPU cycles.
The other option, and what I see very often still, even though it's been years where we talked about this in the industries in general, is CPU hotspots related to string allocations and string
manipulations so i see still a lot of apps using string concatenations where they are building
large strings maybe even html pages or reports and basically allocating on the one side a lot
of memory because they're creating and and and attaching and appending strings but it's also a
lot of c CPU that is consumed
when string objects are copied over from one object to the other.
So this is a big thing.
So I typically find CPU problems when I look at methods
that are somehow dealing with string manipulation.
Regex would be another thing.
Regex is another hot topic, regular expression.
And we're all pros on constructing.
Well, you're actually really good at it, aren't you?
Yeah, one of our favorite things to do is construct a regular expression to capture something,
and it's always a lot of fun.
Yeah.
Someday I plan on mastering regular expressions and VI.
Oh.
Through memory.
So, but yeah, I mean, the advice that I give people out there, because that's the way I do it.
So I figure out, do we have a CPU hotspot?
And is the CPU hotspot in either my own code, in some of the methods, in some of the algorithms that I wrote?
Or is it related to things like string manipulation?
Is it related to regular expressions?
Is it related to manipulating some other types of
objects? And typically, and that's the beauty of either a profiling tool or an APM tool like
Dynatrace, it shows you down to the method how much CPU consumption that method has, especially
in relation with everything else that is going on in the system. Are there specific, this might be
getting too specific, but are there any specific methods or
at least classes that you would see as a dead giveaway um that it is a string concatenation
or just a string manipulation issue well you know it's obviously a java long string that's clear and
then it's the i think it's the java util rege I mean, it's really the base classes.
The way you see it in Dynatrace,
so if Java's long string would show up as a problem or if a method shows up as a problem,
what you can also do is you can actually right-click on that method
in Dynatrace and say, show me the source code.
That's also beautiful because you don't have to be the developer
and actually own and have the source code available,
but because we have the bytecode available,
we can just say decompile the bytecode and show me the Java code
and so you can actually see, oh, this is the reason,
this is the loop where the method is actually going through
thousands of iterations of doing string concatenations.
Yeah, and this is, you talk about leveling up a whole bunch, right?
And we all do.
That's a very common theme in the performance world of, you know, going out beyond just writing a script and running it and handing off results.
What I love about these tools is it gives you that ability to start looking at the code behind it, right?
And in the beginning, you know, I had no idea what I was looking at my my code experience was when i was younger using basic right and i had
a lot you know i had to actually write 10 print brian 20 go to 10 and run and i get my name going
down the screen right uh i also tried running i don't know if you had them in austria but did
you ever hear of something like a choose your own adventure book uh yeah where you flip between the
pages depending on what you do yeah so i was trying to write one of those in basic
and i got to a point where i added a random element it took me a while of going to the library
pulling out a book to see how to do the code because again this was all pre-internet and once
i conquered that i got bored with it so i gave up but point being right i know it's a little fun
little sidetrack but the point being is the more you look under the hood, the more all this stuff is going to make sense to you.
And this is how you level up.
You start understanding what the code is.
You start looking and seeing, oh, the CPU is running hot and we see Java lang string.
Now you know that problem is going to be with the stream manipulation that then allows you to turn back to the development team or the architects or whoever you're in contact with to say hey it looks like we're running hot
on the cpu because of stream manipulation yeah which suddenly bam you just became a level five
dwarf or a level 10 warrior or whatever and and what i can tell you though i guess in most cases
development will come back and say you know what that's not my code yeah yeah they'll say there's nothing I can do about it, that's Java code.
But that's when you have to look above it of what's accessing it.
Exactly.
And what you actually now, it comes to my mind, what you'll see a lot, especially with string manipulations, is your classical XML parsers, your parser, any type of parsing framework.
Right.
Whether it's XML, whether it is JSON.
So these are all basically frameworks that make it easy for developers to actually consume content, but internally they do a lot of stuff.
And so not that these frameworks are bad per se from a CPU perspective.
Most of them are highly optimized, but still you can use it in a way that they end up having a lot of CPU overhead.
So then it's a good argument and throwing it back to developers saying,
well, here's two options for you.
Either you go to the framework vendor,
go to their website,
and see if there is a performance problem known,
and maybe we need to just upgrade to a new version,
or learn how to better use their framework
to actually overcome their problem,
because maybe you're just calling the framework too often.
Maybe you have not parameterized it correctly so that they can actually internally do some optimizations
all right so that's i would love to see the look on a developer's face if i went back to them and
suggested what they should do well we know honestly but this is great this is the part of
the unity right exactly building the team building the whole the devops you kind of yeah feel of the
company and there's no reason reason that anybody in the company
should be...
A marketer,
somebody in marketing,
if they happen to be
very geeky and technical,
should be able to
bring that information
to somebody.
You use the DevOps word.
Ooh, we're really also
getting into the mode
of throwing that in
all the time.
I mean, which is true.
Basically, that's where
the industry is going.
Whether you call it DevOps or for me, it's like agile. I mean, it's true. I mean, basically, that's the way the industry is going. Whether you call it DevOps or Apple,
for me, it's like agile, right?
I mean, it's, we want to,
or let's say that way.
We are there to build better software
and we all have to step up a little bit
and just look out of the box a little bit.
That's what it is.
Right.
It's no longer pointing fingers
and I have mine, you have yours,
and I'll do my thing.
Everybody kind of helps everybody out
and there's no blame.
So CPU again, to kind of sum it up, figure out which methods consume CPU.
If it's not your own code, figure out who is making the call,
and somewhere up the chain, the executions chain,
you will see your own code and maybe it goes through a framework.
Then figure out which framework it is.
Figure out if there's a newer version available.
Figure out if there is any documentation, how to optimize performance in these frameworks.
If it is your own code, then if you use Dynatrace, for instance, right-click, show me the source
code. You'll figure out, oh, it's a loop or it's something else. Show it to the developers.
They will love you for it, hopefully love you for that. And yeah, well, that's the way it is, right?
So I think that's it from a CPU perspective.
So check out the response time hotspot.
Check out the method hotspot.
These are the dashlets that I love the most.
And also what you should do if you're running performance tests,
I think you want to always figure out does the CPU behavior change over time
when you put different types of load on the system,
especially also if your application is dealing with different sets of data?
So it could be that you never see a CPU problem because you're always testing against an empty database,
but then you are testing against a prime database, or if more people are on the system,
and then your app is getting some additional data on the current activity,
and you just have more data to process, then you actually see all the CPU hotspots.
So not just do it on a sample database.
Right.
We covered that a little bit.
Was that the last?
I think so.
I think it was.
That was probably the episode where we had Mark Tomlinson on the call.
Right, right.
Good old Mark.
Yeah.
Marky Mark.
Yeah, just make sure your testing environment is realistic.
Exactly.
And that in itself is a whole different topic, right?
Yeah.
That can be difficult too.
But yeah, you're not going to always see these problems
if you're not testing with the right conditions.
And were there any, you know, I know you're kind of saying
this is the biggest thing with the CPU.
We did mention synchronization and weight.
Are there any big things on those sides, or is that a lot more individualized and a lot you're just looking out for?
Well, so what I'm looking out for, if somebody's testing an app, I think synchronization problems and weighting issues is something you cannot detect with a single user test, obviously.
Right. and waiting issues is something you cannot detect with a single user test, obviously. So you have to have some load on the system,
but you might be surprised just with two users.
If you're using, let's say, if you have JMeter, right,
if you have a nice script,
then just crank it up with a couple of virtual users.
Run them in parallel.
It doesn't have to be a lot.
Typically, you find basic synchronization problems
with two or five concurrent users on the system
or simulating the same requests in parallel, the same request, and then watch out.
And what I really love, what I typically do if I have the chance to run a load test,
so I crank it up from one user to five to 10, 20 over a certain amount of time,
and I watch the metrics over time how much cpu do we consume
how much time do we spend in sync how much time do we spend in weight because then i can immediately
see if we actually have a synchronization or weight problem with increasing load because a
perfectly scaling system will just consume more cpu in the same amount of traffic that is coming in.
But if you see that you are shifting this over
to actually more weighting and synchronization,
you know this becomes a bottleneck
because the more people that come in,
the more you're syncing.
And that's basically then an architectural issue
where you can say, hey, guys,
with 10 users, we already see 50% synchronization time.
That means we have a serious problem here.
So look at these measures.
In Dynatrace, you can chart.
I think we automatically spit out CPU, sync, weight, all this as measures.
You can just chart it over time and just chart it while you're on the load test.
Right.
And in terms of the weight one, that one always gets a little bit fun, right?
Because there's a couple different patterns.
If you're in an asynchronous thread and you're in wait, that could be okay.
It's asynchronous, and it could just be sitting there waiting for the next thing to pick it up and run with it.
So when you're looking at those wait things, if it's in the synchronous part of the thread, that's when you know you definitely have some kind of an issue. But I've also found that waits can sometimes be fun to talk to developers about
because, like, oh, look, we got stuck in wait for five seconds.
And sometimes their first response is, well, yeah,
we have a five-second timeout on the wait before it tries again.
Not understanding, no, but why are we waiting five seconds?
What got us into the condition of waiting five seconds?
So waits can be a little tricky to communicate as an issue to be dealt with
because people develop code for waits
under certain conditions, so it doesn't break.
But you really just want to get the message across
that, but it's going to those wait states quite a lot now,
whereas it wasn't under one user, two users,
now that we have that concurrent load.
So it can be a little tricky. It can be very tricky, user, two users, now that we have that concurrent load. So it could be a little tricky.
It can be very tricky, especially, I mean, I think a lot of developers and architects use it as a way to handle, as you said, asynchronous activity.
And you make an asynchronous call, and then you wait.
But basically, you're blocking your current thread in the caller.
So it would be much more efficient to go with an event-driven model where you actually say, well, I'm doing my stuff.
Now I am triggering off an asynchronous call, but I'm freeing my own thread up for handling
the next incoming request.
And when my initial asynchronous call actually comes back in an event-driven system, I will
then continue with the tasks that I have to do.
But as you said, this is a much tougher discussion than saying fix it because this typically
means a total change in architecture.
Right, right, right.
But you know what I like?
What you said is very good.
In the very beginning, you said don't get always fooled by waits because we see a lot
of waits, especially when you're spawning in Java.
Now, if you're taking Tomcat or any other app servers and you basically have incoming
queues, so your threads that are waiting for picking something up, they will wait.
So it's totally normal that they wait because they have to wait for incoming traffic.
Correct.
So don't get fooled by that.
Yeah, I think sync synchronization is a lot more of a red flag right away than waits.
Again, that's where the leveling up comes in,
where you have to figure out what it's doing and why,
and if there's a good reason for it, And again, that's where the leveling up comes in, where you have to figure out what it's doing and why,
and if there's a good reason for it, or if it's actually impacting the transactions itself and causing problems down the line.
So it gets a little bit trickier, but that's when you take and send your peer paths to Andy.
Exactly.
But yeah, that's hopeful.
But on the other side, really challenge the developers and show them, hey, we see with increasing load, we see weight here,
and is this weight intended or not? We'll use it with a timeout settings, right? We have a timeout setting. So we make the call, then we just wait for five seconds, and then check is the response
here. And then if not, then wait another five seconds. And that might be an approach that was
good in the past. But if you cannot change the architecture, you should still think about,
do we really wait for five seconds? Because maybe the response is already here after 100 milliseconds.
So we are basically wasting 4.9 seconds.
So you really need to figure out how to change these timeout settings.
Maybe you want to make them much shorter.
Because if you know the response is, on average here, much faster, then you need to adapt your timeout settings for that.
Right. Why wait five seconds
when you can go right away or you know if it's if your timeout is 500 milliseconds so then you go
into four weights yeah until it picks up but that's fine yeah it's better than five seconds
yeah exactly and another thing that just comes to my mind synchronizing weight settings on the
different tiers because typically you know you have uh you have tiers talking with each other and you want to make sure that your timeout setting is the same as on the different tiers, because typically, you know, you have tiers talking with each other,
and you want to make sure that your timeout setting is the same as on the other side.
So, for instance, if I call an external service and I give it 60 seconds timeout,
but on the other side, the service that picks it up is automatically throwing an HTTP 500 anyway
after 30 seconds if they're not done, just like maybe the default setting by the app
server then i need to sync it because then i'm just waiting 30 additional seconds for something
that i know is at the latest here after 30 seconds so synchronizing the wait times and the timeouts
yes but synchronizing used in a different way exactly
reminding me i know it's it's not my first language. No, I mean, it's the same.
I would use the same word, too.
I think it's just separating it from a synchronization problem versus synchronizing meaning like,
hey, let's all synchronize our watches because we're going to go spy at the embassy and make sure we get the top secret documents.
Which embassy are we spying on this week?
What's the one?
There was an old Peter Sell seller movie peter seller's movie
called the mouse that roared i forget the name of the fictitional uh if you haven't seen that
movie though go see it it's uh it's an old black and white peter seller's movie so was that before
my time i guess did you ever hear the original pink panther movies oh yeah yeah so the guy who
played the pink panther was peter sellers he was also, another great movie he was in was Dr. Strangelove.
It was a Stanley Kubrick movie.
I was a film major before that.
Anyway, yes, this is, well, before my time, too.
These are all made before I was born.
So should we reveal the age now?
No.
No, that could be one of the, well, I guess maybe anything else on CPU or no?
I think that's it.
So just remember, check out the response time hotspot, the method hotspot.
We show you CPU, I.O., sync, weight, and garbage collection too.
Right.
And if your tool doesn't break it down that way, you know, you could still figure this out.
Just have to really look at what those methods are being invoked. It's going
to take a little more intuition, but it's definitely something you can do or should be
able to do with any kind of either profiler APM tool. But again, once you're into multi-nodes
or microservices, you really want to kind of have some sort of APM tool in there to really
give you that full vision. But speaking of the age thing, so we're going to introduce a new segment to our show.
It's the trivia segment.
And borrowing from Marvel Comics and the old days, they used to give out a no prize, N-O
prize, for people who would catch inconsistencies in the storylines from different episodes. Ours is going to be the K-N-O-W Prize,
where basically if you are the first person to get the answer correct
by tweeting your answer to Dynatrace with hashtag pure performance and no prize,
although if you just get pure performance in there, I'm sure we'll pick it up,
but try to get the two of them in there um we will put your name as the no prize winner on
the uh on the episode on on the site so what we're going to try to do with these trivia questions
though is make it so you can't just google them and even if you can google or being them right
or being them you cannot use a search engine. You're very Bing friendly.
No, the thing is, the reason why I am is because recently somebody pointed out to me,
you always only mention Google.
And it was a Microsoft person that said that to me. Well, Google is more of a verb at this point.
I know.
Yeah, point taken.
You can use your favorite search engine.
Here we go.
Yes, WebCrawler.
Basically, don't do that.
Even if it is one you can search, please don't Google it.
Please don't search it.
Try to use your brain or make some fun guesses because it'll be a lot more fun for everybody that way.
And you're not really getting any real prize anyway except for your name on a website, which I guess isn't that exciting.
Well, it is.
Come on.
It's a pure performance website.
What's wrong with you?
Your name will be in the internet.
Exactly.
So in the tradition of something that you cannot easily search, the first trivia question,
first the inaugural trivia question of pure performance, ladies and gentlemen,
maybe I'll see if I can get a drumroll sound effect.
There we go.
That's better than any drumroll sound effect I can possibly find, is what is the first computer I, Brian Wilson, ever used?
Now, I'm not talking about a calculator.
I remember when I took a programming class in college, the professor was like, anything with a chip is a computer, technically.
So I'm not talking about something like a calculator or a stopwatch or a wristwatch.
I mean an actual computer that has a keyboard and you can type stuff in on.
So if you can figure out what that first one was that I used, then you'll get your name up there.
That's awesome.
And I want to give the audience a hint though okay because they should know are you 20 are you 30 are you 40
i can see that your hair is already a little wider that's a lot white than mine yeah so i am
i am 42 oh you know what that just ruined another question well it didn't ruin another question
but bonus
prize and this is a real easy one low-hanging fruit for any true geeks out there bonus prize is uh
what is 42 but the real prize is what's the first computer i use here do you have any idea about what
42 is probably not i i i think i have an idea but i don't want to spoil it i don't want to give the
audience the answer.
I just didn't know if you had any idea, if you were a true geek or not.
Yeah.
A true geek who, yeah, I guess you'd have to have very strong command
and grown up with the English language, which you mostly have, I think.
Anyhow.
So that's it.
So, again, tweet your answer to at Dynatrace, hashtag pure performance, hashtag no prize, K-N-O-W-P-R-I-Z-E.
And the first one we get, we'll get their name.
If we could use the, was it flash or blink tag from way back, we would do that.
But they killed that one, unfortunately, long ago.
It was a great one.
I remember it was one of my first websites I built.
Was it in GeoCities?
GeoCities always had the best websites.
No, I remember when I was in high school and we built our first HTML pages.
That was awesome.
With the blinking text and then with the text.
What was it called?
The scroll.
The banner, the scroll, yeah.
It was awesome.
And all those great sites were usually built on GeoCities,
and the page would load and it would have some kind of maybe a dancing cat image,
like a drawn one, and there was a MIDI file playing some really bad music.
Yeah, yeah, yeah.
Those were the really good old days of the Internet.
Anyway, so let's go into the next topic.
So the next performance topic we're going to talk about is memory.
Yeah.
So on the memory side, I java in general i mean memory is
mysterious i would say and it's not that easy uh there's a whole i think science about optimizing
memory usage garbage collection uh there i think we cannot go into every single detail
uh we want to we should cover it a little bit i think think in general, there are two big things in memory.
On the one side, it's obviously the classical memory leak,
meaning memory is growing and growing and growing,
and then the garbage collector tries to clear it up,
but at some point, what happens?
It runs out.
It runs out.
That's awesome.
So what happens when it runs out?
It crashes.
It crashes.
So before it crashes, what happens?
Well, you'll see the CPU spinning up really high.
Everything will slow down tremendously, and everyone will be really mad.
And you'll get alerts and alerts, hopefully.
Exactly.
Hopefully you get alerts.
So hopefully, typically, you'll get an auto-memory exception.
So Java tries to store an auto-memory exception,
and then if it cannot really recover, then it just really crashes.
And before you go on, though, I do want to say, and we're not going to go into this at this point either,
but you're talking about a memory leak.
We're talking about a heap memory leak.
Exactly.
There's also a native memory leak, which is very, very difficult.
But it's also important in a Java world because obviously not every code that runs in your Java app is running is Java code and doesn't run within the JVM but may load some native libraries and therefore is allocating native heap space.
So that's why also in Dynatrace, when you do a memory dump, we actually show you how much heap memory do you have used and what's the overall memory consumption of the process.
And basically the difference
then is part of the
native memory. And then we actually show you how the native
memory grows over time and whether you have
a native memory leak, which could be you may
have brought in some external library that is
doing something natively and they keep allocating
memory and therefore
you run out of memory. But let's
go back to the Java heap space.
So memory leaks is a classic one and basically what you do you keep out of memory but let's go back to the java heap space yeah so so memory leaks is
a classic one and that basically what you do you keep watching your memory counters so for java you
have the different heap spaces right what do we have brian um yes well it depends on which java
you're using java right do you still say java java yeah that's a quick side track what always uh
really impressed me about um all the dynatrace Team Europe was with English as a second language.
They somehow picked up the Boston Java.
But anyway, what do we have?
We have Perm.
I'm blanking on this right now.
We have Perm.
Eden.
Yeah, Eden Survivor.
Tenured.
Yeah, Tenured.
But there's different ones with some different frameworks.
It depends on which JVM, right?
It depends if it's the Oracle JVM, is it the IBM JVM, is it the SAP.
There's different JVMs out there.
They call it different.
But basically, it is heap generations.
Correct.
From young objects to old objects.
And what you want to do, you want to monitor all of these heap spaces.
And it's all exposed to jmx typically
that means you can see how much utilization do you have you can also see how much garbage
collection happens in each of these spaces and if you basically see a constant growing of the
survivor space which is basically where objects end up if they cannot get cleared
and it reaches a certain level and it reaches the top and then garbage collection kicks in and then the survivor space, which is basically where objects end up if they cannot get cleared.
And it reaches a certain level,
and it reaches the top,
and then garbage collection kicks in,
and then it crashes,
then you know you have a classic memory leak.
Because typically what you should see if you don't have a memory leak
is the classical, what's it called, the sawtooth?
Yeah, I call it, is it a sawtooth?
Yeah.
It's a sawtooth pattern, yeah.
So that's what you should,
because basically memory grows.
It grows because your application is allocating memory memory and then the objects that are no longer needed
eventually get garbage collected so you should always come back to the bottom of the of the
valley and garbage collection is fine garbage collection is necessary you're never going to
see no garbage collection but you shouldn't see it running for so long that it's impacting your
performance right so typically you know you were talking about the survivor space.
That's where memory goes.
I don't want to say to die, but that's where it ends up when it's something that's used throughout the, for longer periods of time.
But those garbage collections are usually a lot more, have much more of an impact.
So you should be seeing a lot of, in a really good system, you'll see, you might end up seeing a lot of GC invocations,
but they're going to be in the Eden space and they're going to be really short and fast and
they're not going to have that impact. Of course, you know, by design, by, by the design of an
application, the whole idea is as soon as an object is no longer, longer needed, it should get dumped,
right? It's only when those objects are needed longer that they sit sit around longer and they end up go end up going into those uh into what is what tenure it is after eating
exactly and then survivor but there's there always has to be a reason good reason for that but that's
where usually these problems come in where people forget about these objects yeah and typically so
so yeah so watch out for them if if the objects are promoted into the different higher regions
higher heap spaces,
and then if it keeps growing and doesn't come down after a garbage collection run.
So the typical things that I've seen is, and as you said, there's a good reason why objects are living up there
because you have caching frameworks that cache objects.
That's why these objects stay there.
They memory cache.
Oftentimes you'll see that building quite a lot during startup of the application
because it's got to load everything in, and you might see a quick ramp,
but then again you should see it stabilize.
Exactly.
But the thing is if you have any configuration issues with that caching,
that means if the caching strategy itself or the caching framework
is never actually allowing very old objects to fall out of the cache
and not basically understanding that if it keeps putting more and more objects up there,
that it's eventually crashing the system.
That's a classical memory leak.
So either buggy versions of caching frameworks,
misconfigured versions where you don't have any expiring policies for these objects in the cache.
So this is then typically very interesting.
How can you find out about it?
You can take a memory dump.
So you can either take a memory dump when you see, oh, we're reaching that point,
or if you're running and testing, you run some load, you see the memory going up,
then take a memory dump.
Another option would be wait for the system to crash,
and if you have the chance, then the JVM actually takes a memory dump for you.
That's one option. In Dynatrace, we also have the chance, then the JVM actually takes the memory dump for you. That's one option.
In Dynatrace, we also have the feature
if an application crashes and we get the chance,
then we actually capture the memory dump.
Right.
And then actually look at it and see,
hey, which objects are out there?
Because we actually see which objects are still on the heap,
and then typically you find it's probably your own objects
that are then referenced by one of these caching frameworks.
Right, and the other fun thing you see quite a lot of
when you take a memory dump is a lot of string references
tying it back to the CPU, right?
Well, the reason why that is, obviously,
is because if you look at an object,
if you look at a business object,
let's say a person or an order,
what is an order?
An order is a set of values.
And what are these values?
They're either strings or integers.
So in the end, obviously, the most consuming part on the heap are going to be these primitive types.
But they are referenced from these complex objects, business objects.
Right.
And that's actually one of the tricky things about memory, at least in the beginning, is when you're looking at a memory dump, oftentimes your largest consumers of memory are going to be these placeholders for strings and integers and all these other components.
So you might initially say, oh, it's a string issue, right?
But it might not.
It's usually not because you're usually going to see quite a lot of that.
You've got to look a little bit down in the heap oftentimes.
But there are quite oftentimes when it is the string.
But the nice thing about this, so what we try to solve, and I'm sure other memory profiling tools do the same thing,
but we actually show you how much memory is referenced by these business objects.
So we can actually tell you, well, the object itself
doesn't have a whole lot of memory,
but because it references 50 strings
and 100 integers,
the total garbage collection
or the total memory it actually holds onto
is X amount.
And that's why I love the Dynatrace memory dashlet
where you can actually say,
show me the objects that are responsible for keeping how much memory on the heap,
and whether it's that object itself or the object's references.
Yeah, and I think another fun thing when you're taking a dump,
and I haven't used a lot of memory tools outside of Dynatrace,
so I don't know if this is common to a lot of them,
but I love the concept of whether or not you trigger a garbage collection
with the dump or not.
Right?
Because that's going to at least, you know, if you want to see,
if you don't trigger, there's different reasons to do either one, right?
Is this a common feature or is this something that's just kind of?
You know, I think I'm just the same way as you are we're so much used to dynamic trace because we have it available
because once dynamic trace is in a java app you can just click a button and then we get there
i'm pretty sure that this is a feature that most tools have out there and uh and it's it's great
and it comes kind of to the second memory problem that people have but let me just finalize on the
on the memory leak that's good so the finalize on the memory leak so figure out which objects are on the heap which of them
take most of the memory don't get fooled just by strings as you said brian correctly yeah and then
walk back the referrer tree so if you do a full memory dump you can actually see who is referencing
it and then you will see oh okay my business objects are holding a gigabyte of memory and who is referencing them who is holding them account like who's holding them and then you can actually see who is referencing it, and then you will see, oh, okay, my business objects are holding a gigabyte of memory, and who is referencing them? Who is holding
them? And then you can typically step back all the way until you find a global array,
a global something object that is holding onto it, and it typically has to do either
with a framework that you're using, or maybe it was your developers that actually put stuff
into these objects.
Another good example are session objects.
So if you have a web application, you keep on your user objects, your user sessions,
and if you add more sessions to, or more data to the session, then these sessions grow and grow and grow over time.
And these sessions are kept in memory by the application server, depending on the timeout
setting you have for your user sessions.
Typically, I think 30 minutes these days.
Yeah.
And that's why, you know, these objects grow and grow and eventually bring your application
to crash.
And one thing you can do when you're looking at those as well is do a search on the memory
or the caching mechanism.
Like, I remember looking at one about two years ago.
It was HashMap, right? the caching mechanism like i remember looking at one about two years ago it was hash map right and i don't think i'm going to remember the exact component of it but basically hash map was
meant to be used in one way but it was very commonly used in another way to achieve another
thing just because it could do it right and people were like hey we can use hash map to leverage this
and that's awesome hijack another framework to do something else if it's really efficient at it
but what i remember looking at when I looked it up,
when I did a search engine look for it,
was if you're going to use it for that alternate component,
there's a major setting you have to change on it
because otherwise it's going to lock everything up
and be allocating everything into long-term storage on it.
And I believe in that case that was what it was.
But again, I'm still not, I would never consider myself a memory expert.
But at that time, I knew even less.
But just doing a quick search on seeing which, you know, some of the identifiers that you're seeing in that heap oftentimes can help point you to what maybe some of those common mistakes might be to maybe not here's the solution to everybody, but more of here are some ideas we might want to consider to look at,
just to help the rest of the team out with some ideas there.
I love that you mentioned hash maps because basically hash maps, hash tables,
they're all collections, basically collection objects.
And if you do a Dynatrace memory dump and you look at the memory dashlet on the bottom,
remember the tabs that you see?
We show you obviously the biggest objects,
but then really focusing on the biggest collections,
and then basically tell you exactly if you have a problem there.
And I think we also show the biggest user objects,
like session objects on an application server,
and our favorite, again, the strings.
All the strings and
the duplicated strings that's crazy because the same the same string might be allocated 50 000
times and you don't even know about it because your frameworks are going like crazy yeah yeah
so the second memory problem yeah so you mentioned the way the way you triggered it before which you
did pretty well uh you were saying there's an option in Dynatrace where you can say, before I create a memory dump,
I want to trigger the GC
so that I basically see
which objects are actually staying on the heap
after the GC has done its work.
Correct.
So do you want to tell the audience
what you think while we have this feature?
Yeah, so listen,
if you're going to take a memory dump
and not clean out your GC, especially if you're looking at overused consumption, right, or if your memory is pretty large at the time, right, if you take a memory dump and you don't run the GC on it, you don't know what's about to get cleared. So you're going to be looking at a lot of, maybe look, potentially looking at a lot of extraneous objects in memory that have no impact on the problem of, of, you know, large
memory consumption. So if you run that GC beforehand, you're going to be looking at everything
that's stuck in the system that can't get cleared out, that is still being leveraged and utilized.
And if you see something like, you know, an order ID in, in survivor space, then you know very easily that you have an issue.
But there are the times when you not necessarily
would want to trigger a GC,
and I'm blanking on it now.
I've explained this to people in the past,
and I just can't...
Help me out here.
Why would you want to trigger the GC?
I'm seeing you chewing your teeth.
So basically, here's, I think, for me, the second thing.
I totally agree with you, what you said on why garbage collection.
Now, the second thing why we actually see a lot of GC activity is if you have a high object churn rate,
which means your application is actually allocating
a lot of objects that are short-lived.
So that means you're allocating them,
they live for a little while, and they get garbage collected.
And they live for a while, and they get garbage collected.
So in this case, I don't want to run,
in order to find out if we're allocating too many objects,
even though they're garbage collected all the time,
but this is another problem pattern.
So we can find out about it by
creating memory dumps on a consecutive basis but not triggering the gc because you want to see
how many objects of these very maybe small short-living objects do we have on the heap
and we do this over time and then we can figure out wow we have this xml parser object and we
have 10 000 instances of it coming and going, coming and going, coming and going.
And it means, well, they're not a memory leak,
but they mean that the garbage collector
always has to clear a lot of objects.
Right, right.
And basically, by knowing that,
we can talk to the developers,
hey, maybe instead of always allocating
a new XML parser object,
maybe we can reuse them.
Maybe you create one XML parser object per thread,
and then you solve at least concurrency issues
instead of every time a request comes in,
creating a new instance of it that is very short-living.
And this kind of points back to, I believe,
the difference between having a memory leak
versus the memory problem where your GC is too high.
Exactly.
And you can have excessive GC without a memory leak,
and this is the exact kind of a case where that would be happening, where you're just
putting way too many items that are going to get cleaned up into the system.
And GC is just always, always, always running.
And again, garbage collection is consuming your CPU.
So if that's going too heavy, it's going to impact your code.
And if you're not looking at the garbage collection, you'll be looking at your code.
I've had developers before I had insight to garbage collection saying,
but this code is not doing anything with stream concatenations
or stream manipulations.
This is not running anything on CPU.
So why would I be, there's got to be something else hitting the CPU up high.
And if you're not, again, thinking of garbage collection,
you might make the mistake of first going,
oh, what else is running on the box, right?
But stick with the application first before you look outside
and you have to look at those garbage collections.
I think the technical term or the industry term
is called object churning.
So through how many objects does the GC churn through
all the time because you're allocating so many of them?
So, and yeah, I mean, I think that's the two major things,
memory leak and high object churn rate,
which in both cases lead to garbage collection,
but the one is more like the signal before the thing dies, right?
Yeah, think about it as your garbage can at home.
Yeah.
The memory leak is people keep putting stuff in the trash
and no one wants to take the trash out,
so it starts overflowing.
And then whoever puts the last piece in that overflows,
it gets in trouble, right?
And in this case, your app dies.
Yeah.
And the other one is basically
we're just throwing everything out all the time
and then we constantly have to run and run and run.
And basically it's a lot of overhead, so maybe we should throw less things away.
I mean, yeah, less garbage.
I'm just going back to an example of at your home or apartment or something.
I don't know why I'm going this route, but another idea popped in my head where you can think of the garbage collection without the memory issue is when you're taking a shower and someone else goes ahead and flushes the toilet you have too much cold water being flushed away and then you get burned
and scald because you just keep on dumping all the water down the faucet all right so there's
a there's a bringing it back to brain work you don't want to know i don't sometimes understand
how my brain works so there is there's obviously there will be more stuff that we can talk about memory.
But what I want to remind people, we have an excellent Java performance book online.
I think if you Google for Java performance book Dynatrace, you will find it.
And Michael Kopp, back in the years, three or four years ago, he wrote the chapter on memory and garbage collection.
And he really did a phenomenal job explaining
how memory is actually managed by the different JVMs,
the different garbage collection options that you have,
because we haven't even talked about different ways
we can optimize the garbage collector as well.
There's different modes.
Yeah, there's a whole...
That's a whole science.
Yeah.
So check it out.
Check out the Java Performance book
and look at the memory chapter,
and also we have blog posts on blog.dynatrace.com about real-life scenarios where our customers actually showed us how they found memory leaks in their different environments.
Great, great.
And again, it's not just if you're, you know, yes, we'd love for you to be using Dynatrace, but these are articles and ideas that are going to help you no matter what you're using, no matter where you're working.
This is core stuff.
Exactly.
Right?
And I'm glad you mentioned the Java performance book because that's actually where I first started reading in depth about memory.
And probably the reason my mind was drawing such a blank is because I need to go back and review it some more.
And some of the stories, I remember one, an Oracle JDBC driver memory leak that brought the whole IBM Appsphere cluster down.
And so these are the stories that we have out there and kind of showing how it works.
I have a video on my YouTube channel, which says 15 minutes sanity check on memory on Java.
What's your YouTube channel?
My YouTube channel?
Oh, so I have my bit.ly, so bit.ly slash DT tutorials.
All one word.
DT is like in Dynatrace, DT and tutorials.
And is that clear enough?
Yeah, DT tutorials.
So bit.ly slash DT tutorials.
That's awesome.
Because I don't mumble at all.
So between.
No, I do.
I'm joking.
I mumble quite a lot.
I try not to when I'm doing this.
But between your accent and my mumbling, I'm sure this is the...
I hope so.
They figure it out, right?
Give it a couple of trial and errors.
Yeah, but that's good.
And remember kind of maybe to remind people that we also have the Dynatrace free trial that you can download.
So I'll say it once and
then you repeat so that people really get it uh bit.ly slash dt personal so that is bit.ly
bit.ly slash dt personal exactly so that basically brings you to the registration page for the dynatrace
free trial slash personal license we call it personal because it becomes
personal after the 30 days so register that and um yeah if you have any questions or feedback
you can always email us at pureperformance at dynatrace.com if you wanted to follow i'm not too active on twitter but i'm slowly trying to become more but i
am emperor wilson e-m-p-e-r-o-r-w-i-l-s-o-n and we have what's that what i always forget yours
it's okay it's gravener andy so it's g-r-a-b-n-e-r-a-n-d-i correct that's twitter and there's
also of course dynatrace on twitter and't forget, if you know the answer to our trivia question today, what was the first computer I ever used,
remember to send that to at Dynatrace, hashtag pureperformance, hashtag no prize, K-N-O-W-P-R-I-Z-E.
And anything, you know, we are, just so the listeners have an understanding of where we are, we are still in pre-production on all this.
I'd like to give a little bit of the background just because, well, that's always been the stuff that fascinated me.
These will all be soon very, very much going live, and we're very excited for that to be happening.
So if you've been listening and following us, we thank you very, very much for that,
and we'll continue to have a whole bunch of these coming out.
We are up on Spreaker,
and we also have a page on Dynatrace that's being worked on,
so I cannot give out a URL quite yet,
but I'm sure if you are listening to this
and you search Dynatrace Pure Performance,
it will come up in your favorite search engine.
Exactly.
And I think from talking about Java performance, hotspots, performance problems,
we have a long list.
I think we did not get through the database because it's too long.
But our database, web services, messaging, message queues, threads, pools,
these are all topics that are very hot that we should cover in one of the upcoming.
Right.
And if there's other ones that you're interested in and you'd like to hear some ideas on, please
communicate to us in any of those methods I mentioned before.
Even general show ideas, too, if there's something you find fascinating, like cloud.
How do you deal with cloud?
Or any other kind of topics that you think it would be good to hear more about um send us a note and we'll you know we have a lot of people we can pull in for a lot
of great conversations so we'd love to hear ideas from you and we and what what if people want to be
on air with us well they have to get a broadcast license now if you want to be on air listen if you
have a great knowledge base or some
great experiences, let's say you became the performance hero and tackled some amazing
performance issue, might've been one of these common ones, but you have a personal experience
of how you went about doing it. Any, you know, if you have ideas and you'd like to be on air with
us and talk, yeah, send us an email, pureperformance at dynatrace.com,
and let us know what you're thinking,
and we'll try to work that out.
And we promise that we don't make jokes about them,
and we don't harass them, right?
We promise to be good.
Yeah, unless we know you.
Of course.
If we know you, you're going to get harassed.
But if we don't, we'll be very, very nice to you,
and we will give you some virtual chocolates somehow.
I don't know what that even means.
Again, there's my brain going in some way.
Anyhow, we look forward to any feedback, any ideas or suggestions, and I will give you any, Andy, as he's logging into his laptop. Any final thoughts before we sign off?
Any final thoughts?
No, I hope the weather is getting better because it's really cold out there
and we need to walk over to the meetup later.
And other than that, I think just keep sending us these stories,
especially me.
Now, this is not my personal interest.
Send me these PurePaths.
Use my Share Your PurePath program.
Share any other stories.
If you are in the
area where i am follow me on twitter i typically post where i'm traveling to right look me up and
just hunt me down i'm always happy to share a beer and obviously a lot of stories with you but a beer
or find me on one of the dance floors i typically go out salsa dancing in the cities if you have any
recommendations on salsa places in your city,
let me know.
And then we'll find an excuse that I get there.
Of course.
Yeah.
All right.
Well,
thank you all very much.
We'll see you next time.
Goodbye.
Goodbye.