PurePerformance - 004 Top Java Performance Problems

Starting point is 00:00:00 It's time for Pure Performance! Get your stopwatches ready, it's time for Pure Performance with Andy Grabner and Brian Wilson. Hello and welcome to another episode of Pure Performance. My name is Brian Wilson and with me as always I have Andy Grabner. Hello Andy, how are you doing today? Hey Brian, I'm pretty good. I'm just a little shocked about the snowflakes out there. And it is end of April. I think it's just stopped. But we're actually recording out of Denver, Colorado, right? Your hometown.

Starting point is 00:00:46 Downtown Denver. We're recording live in Denver, Colorado. That's amazing. We're always recording live, though, aren't we? Of course. Yeah, we're at a hotel in downtown Denver. Andy's in town for, what are you in town for? I know this is going to be past already when our loyal followers listen to this.

Starting point is 00:01:06 But just in case they're interested in what the topic is that you're going to be addressing tonight, what is it that you're doing here in Denver? Well, besides salsa dancing, what I did last night, I actually go to a meetup tonight. I'm presenting at the Pivotal Cloud Foundry meetup, and I'm talking about metrics-driven continuous delivery. So basically baking performance metrics into the pipeline so that people that actually deliver software and run it on Pivotal Cloud Foundry don't just run their apps and they're totally getting crazy with performance problems.

Starting point is 00:01:37 So yeah, it's about performance, but metrics-driven. That's what I'm here for. Yes, you're always about performance. So Cloud Foundry is really cool stuff. I'm just starting to get into it a little bit too, but unfortunately I cannot be there. It is my daughter's fifth birthday today, so happy birthday, Vivian. Happy birthday. Yay. So today we have a fun show. It's probably going to be one in a series of shows about top Java performance problems and problem patterns. Did I say that right?

Starting point is 00:02:06 You had a better title, I believe, didn't you? I think, well, top performance problems in Java or something like that. And as you said, probably going to be a serious because there's just so many of these problem patterns. And we've been talking about them for years and years. And that's also why our customers, I would say, and the users of Dynatrace, you know, they keep bringing us these examples of extremely bad problems that happen in the Java Enterprise

Starting point is 00:02:28 apps. Right, and so we're going to be covering, we're going to attempt to cover CPU hotspots, some memory patterns and problems, and if we have time, we will get on to the top database problems. You might have heard or seen some

Starting point is 00:02:43 stuff about the database, but it's a very important topic and one that we definitely want to tackle. If we don't get the database today, though, we will definitely get into it in the next one. And there's plenty more that we're going to cover in future Java performance problem episodes. But before we start, well, one thing, people can contact us at several ways you can contact us. You can send an email to pureperformance at dynatrace.com. That's P-U-R-E-P-E-R-F-O-R-M-A-N-C-E at dynatrace.com. Also, my Twitter handle is Emperor Wilson, so E-M-P-E-R-O-R-W-I-L-S-O-N. And Andy, what's your Twitter?

Starting point is 00:03:25 My Twitter handle is GrabnerAndy. So it's G-R-A-B-N-E-R-A-N-D-I. Yes, not Y-A-N-D-I. Exactly. That's what I mean. And any other ways? I think people can probably tweet and should follow the Dynatrace handle. That one too, yes. So it's just at Dynatrace, D-Y-N-A-T-R-A-C-E. Exactly. And I think the Twitter handle for Pure Performance is taken, but we could obviously use a hashtag. So feel free to use the hashtag, hash Pure Performance, if you have any questions, and we'll follow that too. Right. And probably the best way to do general would be tweeting over to Dynatrace, hashtag

Starting point is 00:04:03 Pure Performance. And that's it. So anything else you wanted to bring up before we start? No, I think let's get started. Oh, yeah. Actually, thank you. It's awesome that we sit in the same room and you keep reminding me about stuff. And that's actually the segue over to getting started. A lot of the problems that we see and the reason why we can talk about it

Starting point is 00:04:23 and write about it and speak smart about it is because a lot of our people out there use Dynatrace, but not only the paid version of Dynatrace, but also the free trial. So I'm promoting the free trial because I run the free trial program. So if you ever want to test the stuff out that we are doing and talking about, just go online and search for the Dynatrace free trial. Register for it. You get a 30-day free trial. And then the cool thing is it converts over to a personal license, which means you can keep using Dynatrace full product, full feature on the apps that run on your local machine.

Starting point is 00:04:57 So perfect for developers to do the sanity checks. Perfect for testers, I would say, as well. If you're testing some apps on a machine, install Dynatrace on it, and then you get all these cool things. And you can actually identify. And now there's Segway over, too. And Segway is actually fun because my first company I worked for was also called Segway Software. So let's Segway over to the Dynatrace free trial and personal license. Most problems that I've been talking about and writing about actually came in from people that leveraged the fact that I have a program

Starting point is 00:05:33 that is called Share Your Pure Path, so people can use Dynatrace, export the data, and then send it over to me. And, Brian, believe it or not, Java is the number one technology, I think, that we still see out there. That's also why I guess we picked, at least from our customer perspective, right? They write a lot of apps. Right. And I got a total of 200 people last 12 months that sent me that data.

Starting point is 00:05:56 How many did you say? 200. Okay. I thought you said 1,200. 1,200 would be a little excessive. Well, I would love to. But 200 people sent me that data in the last 12 months. And I analyzed them.

Starting point is 00:06:07 And now I think I want to talk about what we saw on the Java side. Right. So we'll start with the CPU hotspots. And for everyone out there, obviously, a CPU hotspot is a hotspot in the CPU. But more importantly, there are many different kinds of ways you can have cpu cpu hotspots right there are just cpu general cpu consumption uh we can be talking about what synchronization issues and some weight issues and well you know why don't you go into starting with uh some of your favorites i guess we'll let you uh

Starting point is 00:06:40 yeah so you see a lot more of this than i do so the first thing i wanted to say a lot of people say well for cpu hotspot I can just use my profiler. Why do I need another tool? And I agree with them. And I say, if you have a single-tier app and you're a developer and you know how to use your profiling tools, then it's great. But A, those people that may analyze performance don't have a profiler. They don't have the IDE where they can do all that stuff. So that's why they are going to tools like Dynatrace or other APM tools. And also if

Starting point is 00:07:09 you have a distributed app, if you have one tier calling another tier, if you think about a microservice architecture or a service-oriented architecture, then you need to actually trace CPU hotspot from one tier to another tier. And that's why it's so nice to have more professional tools available. So when we talk about CPU hotspots, I typically, what I do when I get, when people share data with me, I open up the response time hotspot dashboard in Dynatrace. And as the name it says,

Starting point is 00:07:34 it gives me the hotspots of the response time and it actually breaks it down into what you said, Brian. The CPU, like which methods actually consume CPU. It also shows me synchronization time. It shows me wait time. So sync would be if code is actually syncing on each other. If you have sync blocks or sync methods,

Starting point is 00:07:52 and they have to wait for each other to actually enter that block, wait would be waiting on an object. And the fourth one would be IL, which is also very interesting. Right. And when you say waiting on an object you're talking about a memory object or a because you know the only reason i'm asking is that you you said you use the word weight both with sync and weight right so let's just clarify that for everybody um because sometimes especially when you're just looking at them initially they could be a little confusing why is one weight and

Starting point is 00:08:24 why is one sync? That's a very good question. So basically, synchronization is just, I would say, a language feature or a runtime feature where you can say, I have a piece of code that only one can enter at a particular point in time because it's handling, it's dealing with some scared resources or shared resources,

Starting point is 00:08:39 and only one can have access to it. Waiting on an object is sometimes used for the same thing. So you can actually say, hey, I have an object, a Java object, and I'm waiting on it until anybody else notifies it. So it also can be used for synchronization. So I'm not able to pass that line of code until somebody notifies that object so that I can get past it. But it can also be used for other notification mechanisms. But typically, they're very closely related, I would say. Yeah, yeah, absolutely. And would you also, I know this one we're going to cover on the memory side,

Starting point is 00:09:16 but would you say garbage collection can also be considered a CPU problem? Oh, yeah. It kind of straddles both worlds, right? Yeah, it does, because if you are too aggressive with memory allocation and the garbage collector needs to kick in all the time, that obviously means that you have higher cpu usage in dynatrace we actually also show and thanks for that yeah this is kind of a shameless plug because it's one of my favorite features of our tool and i don't mean to be plugging our tool but heck we work for them but it's yeah i've with gc when you're running g, you're actually consuming CPU cycles to do that, right?

Starting point is 00:09:45 And if you're not being able to see that specifically as GC from that kind of a breakdown, you might think it's just generic CPU consumption. same time there was some gc activity and there was high cpu here so we can assume it was uh the the cpu utilization was on that gc but if your tool can actually show you this was actually gc and identify it as that then that really just kind of breaks it out into a much clearer and easier to diagnose problem obviously it's not an easier to diagnose problem because then you're dealing with a memory issue and those kind of always get really sticky fast. Let's quickly cycle back to CPU hotspot. So I always go into the response time hotspot. I see in which layer of the app we actually have real CPU consumption,

Starting point is 00:10:36 meaning methods are just cramping in the CPU. What I typically see is it's either methods that really have a bad algorithm implemented, strange loops, you know, like doing too much work. That's one thing. So you can actually drill down to the method level, figuring out which methods needs the CPU cycles. The other option, and what I see very often still, even though it's been years where we talked about this in the industries in general, is CPU hotspots related to string allocations and string manipulations so i see still a lot of apps using string concatenations where they are building large strings maybe even html pages or reports and basically allocating on the one side a lot

Starting point is 00:11:18 of memory because they're creating and and and attaching and appending strings but it's also a lot of c CPU that is consumed when string objects are copied over from one object to the other. So this is a big thing. So I typically find CPU problems when I look at methods that are somehow dealing with string manipulation. Regex would be another thing. Regex is another hot topic, regular expression.

Starting point is 00:11:42 And we're all pros on constructing. Well, you're actually really good at it, aren't you? Yeah, one of our favorite things to do is construct a regular expression to capture something, and it's always a lot of fun. Yeah. Someday I plan on mastering regular expressions and VI. Oh. Through memory.

Starting point is 00:12:04 So, but yeah, I mean, the advice that I give people out there, because that's the way I do it. So I figure out, do we have a CPU hotspot? And is the CPU hotspot in either my own code, in some of the methods, in some of the algorithms that I wrote? Or is it related to things like string manipulation? Is it related to regular expressions? Is it related to manipulating some other types of objects? And typically, and that's the beauty of either a profiling tool or an APM tool like Dynatrace, it shows you down to the method how much CPU consumption that method has, especially

Starting point is 00:12:37 in relation with everything else that is going on in the system. Are there specific, this might be getting too specific, but are there any specific methods or at least classes that you would see as a dead giveaway um that it is a string concatenation or just a string manipulation issue well you know it's obviously a java long string that's clear and then it's the i think it's the java util rege I mean, it's really the base classes. The way you see it in Dynatrace, so if Java's long string would show up as a problem or if a method shows up as a problem, what you can also do is you can actually right-click on that method

Starting point is 00:13:17 in Dynatrace and say, show me the source code. That's also beautiful because you don't have to be the developer and actually own and have the source code available, but because we have the bytecode available, we can just say decompile the bytecode and show me the Java code and so you can actually see, oh, this is the reason, this is the loop where the method is actually going through thousands of iterations of doing string concatenations.

Starting point is 00:13:40 Yeah, and this is, you talk about leveling up a whole bunch, right? And we all do. That's a very common theme in the performance world of, you know, going out beyond just writing a script and running it and handing off results. What I love about these tools is it gives you that ability to start looking at the code behind it, right? And in the beginning, you know, I had no idea what I was looking at my my code experience was when i was younger using basic right and i had a lot you know i had to actually write 10 print brian 20 go to 10 and run and i get my name going down the screen right uh i also tried running i don't know if you had them in austria but did you ever hear of something like a choose your own adventure book uh yeah where you flip between the

Starting point is 00:14:24 pages depending on what you do yeah so i was trying to write one of those in basic and i got to a point where i added a random element it took me a while of going to the library pulling out a book to see how to do the code because again this was all pre-internet and once i conquered that i got bored with it so i gave up but point being right i know it's a little fun little sidetrack but the point being is the more you look under the hood, the more all this stuff is going to make sense to you. And this is how you level up. You start understanding what the code is. You start looking and seeing, oh, the CPU is running hot and we see Java lang string.

Starting point is 00:14:58 Now you know that problem is going to be with the stream manipulation that then allows you to turn back to the development team or the architects or whoever you're in contact with to say hey it looks like we're running hot on the cpu because of stream manipulation yeah which suddenly bam you just became a level five dwarf or a level 10 warrior or whatever and and what i can tell you though i guess in most cases development will come back and say you know what that's not my code yeah yeah they'll say there's nothing I can do about it, that's Java code. But that's when you have to look above it of what's accessing it. Exactly. And what you actually now, it comes to my mind, what you'll see a lot, especially with string manipulations, is your classical XML parsers, your parser, any type of parsing framework. Right.

Starting point is 00:15:40 Whether it's XML, whether it is JSON. So these are all basically frameworks that make it easy for developers to actually consume content, but internally they do a lot of stuff. And so not that these frameworks are bad per se from a CPU perspective. Most of them are highly optimized, but still you can use it in a way that they end up having a lot of CPU overhead. So then it's a good argument and throwing it back to developers saying, well, here's two options for you. Either you go to the framework vendor, go to their website,

Starting point is 00:16:11 and see if there is a performance problem known, and maybe we need to just upgrade to a new version, or learn how to better use their framework to actually overcome their problem, because maybe you're just calling the framework too often. Maybe you have not parameterized it correctly so that they can actually internally do some optimizations all right so that's i would love to see the look on a developer's face if i went back to them and suggested what they should do well we know honestly but this is great this is the part of

Starting point is 00:16:37 the unity right exactly building the team building the whole the devops you kind of yeah feel of the company and there's no reason reason that anybody in the company should be... A marketer, somebody in marketing, if they happen to be very geeky and technical, should be able to

Starting point is 00:16:53 bring that information to somebody. You use the DevOps word. Ooh, we're really also getting into the mode of throwing that in all the time. I mean, which is true.

Starting point is 00:17:02 Basically, that's where the industry is going. Whether you call it DevOps or for me, it's like agile. I mean, it's true. I mean, basically, that's the way the industry is going. Whether you call it DevOps or Apple, for me, it's like agile, right? I mean, it's, we want to, or let's say that way. We are there to build better software and we all have to step up a little bit

Starting point is 00:17:14 and just look out of the box a little bit. That's what it is. Right. It's no longer pointing fingers and I have mine, you have yours, and I'll do my thing. Everybody kind of helps everybody out and there's no blame.

Starting point is 00:17:25 So CPU again, to kind of sum it up, figure out which methods consume CPU. If it's not your own code, figure out who is making the call, and somewhere up the chain, the executions chain, you will see your own code and maybe it goes through a framework. Then figure out which framework it is. Figure out if there's a newer version available. Figure out if there is any documentation, how to optimize performance in these frameworks. If it is your own code, then if you use Dynatrace, for instance, right-click, show me the source

Starting point is 00:17:55 code. You'll figure out, oh, it's a loop or it's something else. Show it to the developers. They will love you for it, hopefully love you for that. And yeah, well, that's the way it is, right? So I think that's it from a CPU perspective. So check out the response time hotspot. Check out the method hotspot. These are the dashlets that I love the most. And also what you should do if you're running performance tests, I think you want to always figure out does the CPU behavior change over time

Starting point is 00:18:23 when you put different types of load on the system, especially also if your application is dealing with different sets of data? So it could be that you never see a CPU problem because you're always testing against an empty database, but then you are testing against a prime database, or if more people are on the system, and then your app is getting some additional data on the current activity, and you just have more data to process, then you actually see all the CPU hotspots. So not just do it on a sample database. Right.

Starting point is 00:18:53 We covered that a little bit. Was that the last? I think so. I think it was. That was probably the episode where we had Mark Tomlinson on the call. Right, right. Good old Mark. Yeah.

Starting point is 00:19:03 Marky Mark. Yeah, just make sure your testing environment is realistic. Exactly. And that in itself is a whole different topic, right? Yeah. That can be difficult too. But yeah, you're not going to always see these problems if you're not testing with the right conditions.

Starting point is 00:19:19 And were there any, you know, I know you're kind of saying this is the biggest thing with the CPU. We did mention synchronization and weight. Are there any big things on those sides, or is that a lot more individualized and a lot you're just looking out for? Well, so what I'm looking out for, if somebody's testing an app, I think synchronization problems and weighting issues is something you cannot detect with a single user test, obviously. Right. and waiting issues is something you cannot detect with a single user test, obviously. So you have to have some load on the system, but you might be surprised just with two users. If you're using, let's say, if you have JMeter, right,

Starting point is 00:19:53 if you have a nice script, then just crank it up with a couple of virtual users. Run them in parallel. It doesn't have to be a lot. Typically, you find basic synchronization problems with two or five concurrent users on the system or simulating the same requests in parallel, the same request, and then watch out. And what I really love, what I typically do if I have the chance to run a load test,

Starting point is 00:20:15 so I crank it up from one user to five to 10, 20 over a certain amount of time, and I watch the metrics over time how much cpu do we consume how much time do we spend in sync how much time do we spend in weight because then i can immediately see if we actually have a synchronization or weight problem with increasing load because a perfectly scaling system will just consume more cpu in the same amount of traffic that is coming in. But if you see that you are shifting this over to actually more weighting and synchronization, you know this becomes a bottleneck

Starting point is 00:20:54 because the more people that come in, the more you're syncing. And that's basically then an architectural issue where you can say, hey, guys, with 10 users, we already see 50% synchronization time. That means we have a serious problem here. So look at these measures. In Dynatrace, you can chart.

Starting point is 00:21:09 I think we automatically spit out CPU, sync, weight, all this as measures. You can just chart it over time and just chart it while you're on the load test. Right. And in terms of the weight one, that one always gets a little bit fun, right? Because there's a couple different patterns. If you're in an asynchronous thread and you're in wait, that could be okay. It's asynchronous, and it could just be sitting there waiting for the next thing to pick it up and run with it. So when you're looking at those wait things, if it's in the synchronous part of the thread, that's when you know you definitely have some kind of an issue. But I've also found that waits can sometimes be fun to talk to developers about

Starting point is 00:21:46 because, like, oh, look, we got stuck in wait for five seconds. And sometimes their first response is, well, yeah, we have a five-second timeout on the wait before it tries again. Not understanding, no, but why are we waiting five seconds? What got us into the condition of waiting five seconds? So waits can be a little tricky to communicate as an issue to be dealt with because people develop code for waits under certain conditions, so it doesn't break.

Starting point is 00:22:12 But you really just want to get the message across that, but it's going to those wait states quite a lot now, whereas it wasn't under one user, two users, now that we have that concurrent load. So it can be a little tricky. It can be very tricky, user, two users, now that we have that concurrent load. So it could be a little tricky. It can be very tricky, especially, I mean, I think a lot of developers and architects use it as a way to handle, as you said, asynchronous activity. And you make an asynchronous call, and then you wait. But basically, you're blocking your current thread in the caller.

Starting point is 00:22:39 So it would be much more efficient to go with an event-driven model where you actually say, well, I'm doing my stuff. Now I am triggering off an asynchronous call, but I'm freeing my own thread up for handling the next incoming request. And when my initial asynchronous call actually comes back in an event-driven system, I will then continue with the tasks that I have to do. But as you said, this is a much tougher discussion than saying fix it because this typically means a total change in architecture. Right, right, right.

Starting point is 00:23:08 But you know what I like? What you said is very good. In the very beginning, you said don't get always fooled by waits because we see a lot of waits, especially when you're spawning in Java. Now, if you're taking Tomcat or any other app servers and you basically have incoming queues, so your threads that are waiting for picking something up, they will wait. So it's totally normal that they wait because they have to wait for incoming traffic. Correct.

Starting point is 00:23:30 So don't get fooled by that. Yeah, I think sync synchronization is a lot more of a red flag right away than waits. Again, that's where the leveling up comes in, where you have to figure out what it's doing and why, and if there's a good reason for it, And again, that's where the leveling up comes in, where you have to figure out what it's doing and why, and if there's a good reason for it, or if it's actually impacting the transactions itself and causing problems down the line. So it gets a little bit trickier, but that's when you take and send your peer paths to Andy. Exactly.

Starting point is 00:23:58 But yeah, that's hopeful. But on the other side, really challenge the developers and show them, hey, we see with increasing load, we see weight here, and is this weight intended or not? We'll use it with a timeout settings, right? We have a timeout setting. So we make the call, then we just wait for five seconds, and then check is the response here. And then if not, then wait another five seconds. And that might be an approach that was good in the past. But if you cannot change the architecture, you should still think about, do we really wait for five seconds? Because maybe the response is already here after 100 milliseconds. So we are basically wasting 4.9 seconds. So you really need to figure out how to change these timeout settings.

Starting point is 00:24:33 Maybe you want to make them much shorter. Because if you know the response is, on average here, much faster, then you need to adapt your timeout settings for that. Right. Why wait five seconds when you can go right away or you know if it's if your timeout is 500 milliseconds so then you go into four weights yeah until it picks up but that's fine yeah it's better than five seconds yeah exactly and another thing that just comes to my mind synchronizing weight settings on the different tiers because typically you know you have uh you have tiers talking with each other and you want to make sure that your timeout setting is the same as on the different tiers, because typically, you know, you have tiers talking with each other, and you want to make sure that your timeout setting is the same as on the other side.

Starting point is 00:25:08 So, for instance, if I call an external service and I give it 60 seconds timeout, but on the other side, the service that picks it up is automatically throwing an HTTP 500 anyway after 30 seconds if they're not done, just like maybe the default setting by the app server then i need to sync it because then i'm just waiting 30 additional seconds for something that i know is at the latest here after 30 seconds so synchronizing the wait times and the timeouts yes but synchronizing used in a different way exactly reminding me i know it's it's not my first language. No, I mean, it's the same. I would use the same word, too.

Starting point is 00:25:47 I think it's just separating it from a synchronization problem versus synchronizing meaning like, hey, let's all synchronize our watches because we're going to go spy at the embassy and make sure we get the top secret documents. Which embassy are we spying on this week? What's the one? There was an old Peter Sell seller movie peter seller's movie called the mouse that roared i forget the name of the fictitional uh if you haven't seen that movie though go see it it's uh it's an old black and white peter seller's movie so was that before my time i guess did you ever hear the original pink panther movies oh yeah yeah so the guy who

Starting point is 00:26:20 played the pink panther was peter sellers he was also, another great movie he was in was Dr. Strangelove. It was a Stanley Kubrick movie. I was a film major before that. Anyway, yes, this is, well, before my time, too. These are all made before I was born. So should we reveal the age now? No. No, that could be one of the, well, I guess maybe anything else on CPU or no?

Starting point is 00:26:46 I think that's it. So just remember, check out the response time hotspot, the method hotspot. We show you CPU, I.O., sync, weight, and garbage collection too. Right. And if your tool doesn't break it down that way, you know, you could still figure this out. Just have to really look at what those methods are being invoked. It's going to take a little more intuition, but it's definitely something you can do or should be able to do with any kind of either profiler APM tool. But again, once you're into multi-nodes

Starting point is 00:27:17 or microservices, you really want to kind of have some sort of APM tool in there to really give you that full vision. But speaking of the age thing, so we're going to introduce a new segment to our show. It's the trivia segment. And borrowing from Marvel Comics and the old days, they used to give out a no prize, N-O prize, for people who would catch inconsistencies in the storylines from different episodes. Ours is going to be the K-N-O-W Prize, where basically if you are the first person to get the answer correct by tweeting your answer to Dynatrace with hashtag pure performance and no prize, although if you just get pure performance in there, I'm sure we'll pick it up,

Starting point is 00:28:04 but try to get the two of them in there um we will put your name as the no prize winner on the uh on the episode on on the site so what we're going to try to do with these trivia questions though is make it so you can't just google them and even if you can google or being them right or being them you cannot use a search engine. You're very Bing friendly. No, the thing is, the reason why I am is because recently somebody pointed out to me, you always only mention Google. And it was a Microsoft person that said that to me. Well, Google is more of a verb at this point. I know.

Starting point is 00:28:36 Yeah, point taken. You can use your favorite search engine. Here we go. Yes, WebCrawler. Basically, don't do that. Even if it is one you can search, please don't Google it. Please don't search it. Try to use your brain or make some fun guesses because it'll be a lot more fun for everybody that way.

Starting point is 00:28:55 And you're not really getting any real prize anyway except for your name on a website, which I guess isn't that exciting. Well, it is. Come on. It's a pure performance website. What's wrong with you? Your name will be in the internet. Exactly. So in the tradition of something that you cannot easily search, the first trivia question,

Starting point is 00:29:20 first the inaugural trivia question of pure performance, ladies and gentlemen, maybe I'll see if I can get a drumroll sound effect. There we go. That's better than any drumroll sound effect I can possibly find, is what is the first computer I, Brian Wilson, ever used? Now, I'm not talking about a calculator. I remember when I took a programming class in college, the professor was like, anything with a chip is a computer, technically. So I'm not talking about something like a calculator or a stopwatch or a wristwatch. I mean an actual computer that has a keyboard and you can type stuff in on.

Starting point is 00:29:57 So if you can figure out what that first one was that I used, then you'll get your name up there. That's awesome. And I want to give the audience a hint though okay because they should know are you 20 are you 30 are you 40 i can see that your hair is already a little wider that's a lot white than mine yeah so i am i am 42 oh you know what that just ruined another question well it didn't ruin another question but bonus prize and this is a real easy one low-hanging fruit for any true geeks out there bonus prize is uh what is 42 but the real prize is what's the first computer i use here do you have any idea about what

Starting point is 00:30:39 42 is probably not i i i think i have an idea but i don't want to spoil it i don't want to give the audience the answer. I just didn't know if you had any idea, if you were a true geek or not. Yeah. A true geek who, yeah, I guess you'd have to have very strong command and grown up with the English language, which you mostly have, I think. Anyhow. So that's it.

Starting point is 00:31:01 So, again, tweet your answer to at Dynatrace, hashtag pure performance, hashtag no prize, K-N-O-W-P-R-I-Z-E. And the first one we get, we'll get their name. If we could use the, was it flash or blink tag from way back, we would do that. But they killed that one, unfortunately, long ago. It was a great one. I remember it was one of my first websites I built. Was it in GeoCities? GeoCities always had the best websites.

Starting point is 00:31:30 No, I remember when I was in high school and we built our first HTML pages. That was awesome. With the blinking text and then with the text. What was it called? The scroll. The banner, the scroll, yeah. It was awesome. And all those great sites were usually built on GeoCities,

Starting point is 00:31:44 and the page would load and it would have some kind of maybe a dancing cat image, like a drawn one, and there was a MIDI file playing some really bad music. Yeah, yeah, yeah. Those were the really good old days of the Internet. Anyway, so let's go into the next topic. So the next performance topic we're going to talk about is memory. Yeah. So on the memory side, I java in general i mean memory is

Starting point is 00:32:07 mysterious i would say and it's not that easy uh there's a whole i think science about optimizing memory usage garbage collection uh there i think we cannot go into every single detail uh we want to we should cover it a little bit i think think in general, there are two big things in memory. On the one side, it's obviously the classical memory leak, meaning memory is growing and growing and growing, and then the garbage collector tries to clear it up, but at some point, what happens? It runs out.

Starting point is 00:32:36 It runs out. That's awesome. So what happens when it runs out? It crashes. It crashes. So before it crashes, what happens? Well, you'll see the CPU spinning up really high. Everything will slow down tremendously, and everyone will be really mad.

Starting point is 00:32:52 And you'll get alerts and alerts, hopefully. Exactly. Hopefully you get alerts. So hopefully, typically, you'll get an auto-memory exception. So Java tries to store an auto-memory exception, and then if it cannot really recover, then it just really crashes. And before you go on, though, I do want to say, and we're not going to go into this at this point either, but you're talking about a memory leak.

Starting point is 00:33:11 We're talking about a heap memory leak. Exactly. There's also a native memory leak, which is very, very difficult. But it's also important in a Java world because obviously not every code that runs in your Java app is running is Java code and doesn't run within the JVM but may load some native libraries and therefore is allocating native heap space. So that's why also in Dynatrace, when you do a memory dump, we actually show you how much heap memory do you have used and what's the overall memory consumption of the process. And basically the difference then is part of the native memory. And then we actually show you how the native

Starting point is 00:33:50 memory grows over time and whether you have a native memory leak, which could be you may have brought in some external library that is doing something natively and they keep allocating memory and therefore you run out of memory. But let's go back to the Java heap space. So memory leaks is a classic one and basically what you do you keep out of memory but let's go back to the java heap space yeah so so memory leaks is

Starting point is 00:34:05 a classic one and that basically what you do you keep watching your memory counters so for java you have the different heap spaces right what do we have brian um yes well it depends on which java you're using java right do you still say java java yeah that's a quick side track what always uh really impressed me about um all the dynatrace Team Europe was with English as a second language. They somehow picked up the Boston Java. But anyway, what do we have? We have Perm. I'm blanking on this right now.

Starting point is 00:34:39 We have Perm. Eden. Yeah, Eden Survivor. Tenured. Yeah, Tenured. But there's different ones with some different frameworks. It depends on which JVM, right? It depends if it's the Oracle JVM, is it the IBM JVM, is it the SAP.

Starting point is 00:34:51 There's different JVMs out there. They call it different. But basically, it is heap generations. Correct. From young objects to old objects. And what you want to do, you want to monitor all of these heap spaces. And it's all exposed to jmx typically that means you can see how much utilization do you have you can also see how much garbage

Starting point is 00:35:11 collection happens in each of these spaces and if you basically see a constant growing of the survivor space which is basically where objects end up if they cannot get cleared and it reaches a certain level and it reaches the top and then garbage collection kicks in and then the survivor space, which is basically where objects end up if they cannot get cleared. And it reaches a certain level, and it reaches the top, and then garbage collection kicks in, and then it crashes, then you know you have a classic memory leak.

Starting point is 00:35:33 Because typically what you should see if you don't have a memory leak is the classical, what's it called, the sawtooth? Yeah, I call it, is it a sawtooth? Yeah. It's a sawtooth pattern, yeah. So that's what you should, because basically memory grows. It grows because your application is allocating memory memory and then the objects that are no longer needed

Starting point is 00:35:48 eventually get garbage collected so you should always come back to the bottom of the of the valley and garbage collection is fine garbage collection is necessary you're never going to see no garbage collection but you shouldn't see it running for so long that it's impacting your performance right so typically you know you were talking about the survivor space. That's where memory goes. I don't want to say to die, but that's where it ends up when it's something that's used throughout the, for longer periods of time. But those garbage collections are usually a lot more, have much more of an impact. So you should be seeing a lot of, in a really good system, you'll see, you might end up seeing a lot of GC invocations,

Starting point is 00:36:26 but they're going to be in the Eden space and they're going to be really short and fast and they're not going to have that impact. Of course, you know, by design, by, by the design of an application, the whole idea is as soon as an object is no longer, longer needed, it should get dumped, right? It's only when those objects are needed longer that they sit sit around longer and they end up go end up going into those uh into what is what tenure it is after eating exactly and then survivor but there's there always has to be a reason good reason for that but that's where usually these problems come in where people forget about these objects yeah and typically so so yeah so watch out for them if if the objects are promoted into the different higher regions higher heap spaces,

Starting point is 00:37:05 and then if it keeps growing and doesn't come down after a garbage collection run. So the typical things that I've seen is, and as you said, there's a good reason why objects are living up there because you have caching frameworks that cache objects. That's why these objects stay there. They memory cache. Oftentimes you'll see that building quite a lot during startup of the application because it's got to load everything in, and you might see a quick ramp, but then again you should see it stabilize.

Starting point is 00:37:31 Exactly. But the thing is if you have any configuration issues with that caching, that means if the caching strategy itself or the caching framework is never actually allowing very old objects to fall out of the cache and not basically understanding that if it keeps putting more and more objects up there, that it's eventually crashing the system. That's a classical memory leak. So either buggy versions of caching frameworks,

Starting point is 00:37:56 misconfigured versions where you don't have any expiring policies for these objects in the cache. So this is then typically very interesting. How can you find out about it? You can take a memory dump. So you can either take a memory dump when you see, oh, we're reaching that point, or if you're running and testing, you run some load, you see the memory going up, then take a memory dump. Another option would be wait for the system to crash,

Starting point is 00:38:20 and if you have the chance, then the JVM actually takes a memory dump for you. That's one option. In Dynatrace, we also have the chance, then the JVM actually takes the memory dump for you. That's one option. In Dynatrace, we also have the feature if an application crashes and we get the chance, then we actually capture the memory dump. Right. And then actually look at it and see, hey, which objects are out there?

Starting point is 00:38:36 Because we actually see which objects are still on the heap, and then typically you find it's probably your own objects that are then referenced by one of these caching frameworks. Right, and the other fun thing you see quite a lot of when you take a memory dump is a lot of string references tying it back to the CPU, right? Well, the reason why that is, obviously, is because if you look at an object,

Starting point is 00:39:00 if you look at a business object, let's say a person or an order, what is an order? An order is a set of values. And what are these values? They're either strings or integers. So in the end, obviously, the most consuming part on the heap are going to be these primitive types. But they are referenced from these complex objects, business objects.

Starting point is 00:39:20 Right. And that's actually one of the tricky things about memory, at least in the beginning, is when you're looking at a memory dump, oftentimes your largest consumers of memory are going to be these placeholders for strings and integers and all these other components. So you might initially say, oh, it's a string issue, right? But it might not. It's usually not because you're usually going to see quite a lot of that. You've got to look a little bit down in the heap oftentimes. But there are quite oftentimes when it is the string. But the nice thing about this, so what we try to solve, and I'm sure other memory profiling tools do the same thing,

Starting point is 00:39:57 but we actually show you how much memory is referenced by these business objects. So we can actually tell you, well, the object itself doesn't have a whole lot of memory, but because it references 50 strings and 100 integers, the total garbage collection or the total memory it actually holds onto is X amount.

Starting point is 00:40:18 And that's why I love the Dynatrace memory dashlet where you can actually say, show me the objects that are responsible for keeping how much memory on the heap, and whether it's that object itself or the object's references. Yeah, and I think another fun thing when you're taking a dump, and I haven't used a lot of memory tools outside of Dynatrace, so I don't know if this is common to a lot of them, but I love the concept of whether or not you trigger a garbage collection

Starting point is 00:40:48 with the dump or not. Right? Because that's going to at least, you know, if you want to see, if you don't trigger, there's different reasons to do either one, right? Is this a common feature or is this something that's just kind of? You know, I think I'm just the same way as you are we're so much used to dynamic trace because we have it available because once dynamic trace is in a java app you can just click a button and then we get there i'm pretty sure that this is a feature that most tools have out there and uh and it's it's great

Starting point is 00:41:18 and it comes kind of to the second memory problem that people have but let me just finalize on the on the memory leak that's good so the finalize on the memory leak so figure out which objects are on the heap which of them take most of the memory don't get fooled just by strings as you said brian correctly yeah and then walk back the referrer tree so if you do a full memory dump you can actually see who is referencing it and then you will see oh okay my business objects are holding a gigabyte of memory and who is referencing them who is holding them account like who's holding them and then you can actually see who is referencing it, and then you will see, oh, okay, my business objects are holding a gigabyte of memory, and who is referencing them? Who is holding them? And then you can typically step back all the way until you find a global array, a global something object that is holding onto it, and it typically has to do either with a framework that you're using, or maybe it was your developers that actually put stuff

Starting point is 00:42:03 into these objects. Another good example are session objects. So if you have a web application, you keep on your user objects, your user sessions, and if you add more sessions to, or more data to the session, then these sessions grow and grow and grow over time. And these sessions are kept in memory by the application server, depending on the timeout setting you have for your user sessions. Typically, I think 30 minutes these days. Yeah.

Starting point is 00:42:26 And that's why, you know, these objects grow and grow and eventually bring your application to crash. And one thing you can do when you're looking at those as well is do a search on the memory or the caching mechanism. Like, I remember looking at one about two years ago. It was HashMap, right? the caching mechanism like i remember looking at one about two years ago it was hash map right and i don't think i'm going to remember the exact component of it but basically hash map was meant to be used in one way but it was very commonly used in another way to achieve another thing just because it could do it right and people were like hey we can use hash map to leverage this

Starting point is 00:43:00 and that's awesome hijack another framework to do something else if it's really efficient at it but what i remember looking at when I looked it up, when I did a search engine look for it, was if you're going to use it for that alternate component, there's a major setting you have to change on it because otherwise it's going to lock everything up and be allocating everything into long-term storage on it. And I believe in that case that was what it was.

Starting point is 00:43:24 But again, I'm still not, I would never consider myself a memory expert. But at that time, I knew even less. But just doing a quick search on seeing which, you know, some of the identifiers that you're seeing in that heap oftentimes can help point you to what maybe some of those common mistakes might be to maybe not here's the solution to everybody, but more of here are some ideas we might want to consider to look at, just to help the rest of the team out with some ideas there. I love that you mentioned hash maps because basically hash maps, hash tables, they're all collections, basically collection objects. And if you do a Dynatrace memory dump and you look at the memory dashlet on the bottom, remember the tabs that you see?

Starting point is 00:44:04 We show you obviously the biggest objects, but then really focusing on the biggest collections, and then basically tell you exactly if you have a problem there. And I think we also show the biggest user objects, like session objects on an application server, and our favorite, again, the strings. All the strings and the duplicated strings that's crazy because the same the same string might be allocated 50 000

Starting point is 00:44:30 times and you don't even know about it because your frameworks are going like crazy yeah yeah so the second memory problem yeah so you mentioned the way the way you triggered it before which you did pretty well uh you were saying there's an option in Dynatrace where you can say, before I create a memory dump, I want to trigger the GC so that I basically see which objects are actually staying on the heap after the GC has done its work. Correct.

Starting point is 00:44:55 So do you want to tell the audience what you think while we have this feature? Yeah, so listen, if you're going to take a memory dump and not clean out your GC, especially if you're looking at overused consumption, right, or if your memory is pretty large at the time, right, if you take a memory dump and you don't run the GC on it, you don't know what's about to get cleared. So you're going to be looking at a lot of, maybe look, potentially looking at a lot of extraneous objects in memory that have no impact on the problem of, of, you know, large memory consumption. So if you run that GC beforehand, you're going to be looking at everything that's stuck in the system that can't get cleared out, that is still being leveraged and utilized. And if you see something like, you know, an order ID in, in survivor space, then you know very easily that you have an issue.

Starting point is 00:45:48 But there are the times when you not necessarily would want to trigger a GC, and I'm blanking on it now. I've explained this to people in the past, and I just can't... Help me out here. Why would you want to trigger the GC? I'm seeing you chewing your teeth.

Starting point is 00:46:09 So basically, here's, I think, for me, the second thing. I totally agree with you, what you said on why garbage collection. Now, the second thing why we actually see a lot of GC activity is if you have a high object churn rate, which means your application is actually allocating a lot of objects that are short-lived. So that means you're allocating them, they live for a little while, and they get garbage collected. And they live for a while, and they get garbage collected.

Starting point is 00:46:33 So in this case, I don't want to run, in order to find out if we're allocating too many objects, even though they're garbage collected all the time, but this is another problem pattern. So we can find out about it by creating memory dumps on a consecutive basis but not triggering the gc because you want to see how many objects of these very maybe small short-living objects do we have on the heap and we do this over time and then we can figure out wow we have this xml parser object and we

Starting point is 00:47:03 have 10 000 instances of it coming and going, coming and going, coming and going. And it means, well, they're not a memory leak, but they mean that the garbage collector always has to clear a lot of objects. Right, right. And basically, by knowing that, we can talk to the developers, hey, maybe instead of always allocating

Starting point is 00:47:20 a new XML parser object, maybe we can reuse them. Maybe you create one XML parser object per thread, and then you solve at least concurrency issues instead of every time a request comes in, creating a new instance of it that is very short-living. And this kind of points back to, I believe, the difference between having a memory leak

Starting point is 00:47:38 versus the memory problem where your GC is too high. Exactly. And you can have excessive GC without a memory leak, and this is the exact kind of a case where that would be happening, where you're just putting way too many items that are going to get cleaned up into the system. And GC is just always, always, always running. And again, garbage collection is consuming your CPU. So if that's going too heavy, it's going to impact your code.

Starting point is 00:48:01 And if you're not looking at the garbage collection, you'll be looking at your code. I've had developers before I had insight to garbage collection saying, but this code is not doing anything with stream concatenations or stream manipulations. This is not running anything on CPU. So why would I be, there's got to be something else hitting the CPU up high. And if you're not, again, thinking of garbage collection, you might make the mistake of first going,

Starting point is 00:48:28 oh, what else is running on the box, right? But stick with the application first before you look outside and you have to look at those garbage collections. I think the technical term or the industry term is called object churning. So through how many objects does the GC churn through all the time because you're allocating so many of them? So, and yeah, I mean, I think that's the two major things,

Starting point is 00:48:50 memory leak and high object churn rate, which in both cases lead to garbage collection, but the one is more like the signal before the thing dies, right? Yeah, think about it as your garbage can at home. Yeah. The memory leak is people keep putting stuff in the trash and no one wants to take the trash out, so it starts overflowing.

Starting point is 00:49:12 And then whoever puts the last piece in that overflows, it gets in trouble, right? And in this case, your app dies. Yeah. And the other one is basically we're just throwing everything out all the time and then we constantly have to run and run and run. And basically it's a lot of overhead, so maybe we should throw less things away.

Starting point is 00:49:28 I mean, yeah, less garbage. I'm just going back to an example of at your home or apartment or something. I don't know why I'm going this route, but another idea popped in my head where you can think of the garbage collection without the memory issue is when you're taking a shower and someone else goes ahead and flushes the toilet you have too much cold water being flushed away and then you get burned and scald because you just keep on dumping all the water down the faucet all right so there's a there's a bringing it back to brain work you don't want to know i don't sometimes understand how my brain works so there is there's obviously there will be more stuff that we can talk about memory. But what I want to remind people, we have an excellent Java performance book online. I think if you Google for Java performance book Dynatrace, you will find it.

Starting point is 00:50:15 And Michael Kopp, back in the years, three or four years ago, he wrote the chapter on memory and garbage collection. And he really did a phenomenal job explaining how memory is actually managed by the different JVMs, the different garbage collection options that you have, because we haven't even talked about different ways we can optimize the garbage collector as well. There's different modes. Yeah, there's a whole...

Starting point is 00:50:37 That's a whole science. Yeah. So check it out. Check out the Java Performance book and look at the memory chapter, and also we have blog posts on blog.dynatrace.com about real-life scenarios where our customers actually showed us how they found memory leaks in their different environments. Great, great. And again, it's not just if you're, you know, yes, we'd love for you to be using Dynatrace, but these are articles and ideas that are going to help you no matter what you're using, no matter where you're working.

Starting point is 00:51:05 This is core stuff. Exactly. Right? And I'm glad you mentioned the Java performance book because that's actually where I first started reading in depth about memory. And probably the reason my mind was drawing such a blank is because I need to go back and review it some more. And some of the stories, I remember one, an Oracle JDBC driver memory leak that brought the whole IBM Appsphere cluster down. And so these are the stories that we have out there and kind of showing how it works. I have a video on my YouTube channel, which says 15 minutes sanity check on memory on Java.

Starting point is 00:51:38 What's your YouTube channel? My YouTube channel? Oh, so I have my bit.ly, so bit.ly slash DT tutorials. All one word. DT is like in Dynatrace, DT and tutorials. And is that clear enough? Yeah, DT tutorials. So bit.ly slash DT tutorials.

Starting point is 00:51:59 That's awesome. Because I don't mumble at all. So between. No, I do. I'm joking. I mumble quite a lot. I try not to when I'm doing this. But between your accent and my mumbling, I'm sure this is the...

Starting point is 00:52:10 I hope so. They figure it out, right? Give it a couple of trial and errors. Yeah, but that's good. And remember kind of maybe to remind people that we also have the Dynatrace free trial that you can download. So I'll say it once and then you repeat so that people really get it uh bit.ly slash dt personal so that is bit.ly bit.ly slash dt personal exactly so that basically brings you to the registration page for the dynatrace

Starting point is 00:52:41 free trial slash personal license we call it personal because it becomes personal after the 30 days so register that and um yeah if you have any questions or feedback you can always email us at pureperformance at dynatrace.com if you wanted to follow i'm not too active on twitter but i'm slowly trying to become more but i am emperor wilson e-m-p-e-r-o-r-w-i-l-s-o-n and we have what's that what i always forget yours it's okay it's gravener andy so it's g-r-a-b-n-e-r-a-n-d-i correct that's twitter and there's also of course dynatrace on twitter and't forget, if you know the answer to our trivia question today, what was the first computer I ever used, remember to send that to at Dynatrace, hashtag pureperformance, hashtag no prize, K-N-O-W-P-R-I-Z-E. And anything, you know, we are, just so the listeners have an understanding of where we are, we are still in pre-production on all this.

Starting point is 00:53:50 I'd like to give a little bit of the background just because, well, that's always been the stuff that fascinated me. These will all be soon very, very much going live, and we're very excited for that to be happening. So if you've been listening and following us, we thank you very, very much for that, and we'll continue to have a whole bunch of these coming out. We are up on Spreaker, and we also have a page on Dynatrace that's being worked on, so I cannot give out a URL quite yet, but I'm sure if you are listening to this

Starting point is 00:54:21 and you search Dynatrace Pure Performance, it will come up in your favorite search engine. Exactly. And I think from talking about Java performance, hotspots, performance problems, we have a long list. I think we did not get through the database because it's too long. But our database, web services, messaging, message queues, threads, pools, these are all topics that are very hot that we should cover in one of the upcoming.

Starting point is 00:54:45 Right. And if there's other ones that you're interested in and you'd like to hear some ideas on, please communicate to us in any of those methods I mentioned before. Even general show ideas, too, if there's something you find fascinating, like cloud. How do you deal with cloud? Or any other kind of topics that you think it would be good to hear more about um send us a note and we'll you know we have a lot of people we can pull in for a lot of great conversations so we'd love to hear ideas from you and we and what what if people want to be on air with us well they have to get a broadcast license now if you want to be on air listen if you

Starting point is 00:55:23 have a great knowledge base or some great experiences, let's say you became the performance hero and tackled some amazing performance issue, might've been one of these common ones, but you have a personal experience of how you went about doing it. Any, you know, if you have ideas and you'd like to be on air with us and talk, yeah, send us an email, pureperformance at dynatrace.com, and let us know what you're thinking, and we'll try to work that out. And we promise that we don't make jokes about them,

Starting point is 00:55:52 and we don't harass them, right? We promise to be good. Yeah, unless we know you. Of course. If we know you, you're going to get harassed. But if we don't, we'll be very, very nice to you, and we will give you some virtual chocolates somehow. I don't know what that even means.

Starting point is 00:56:09 Again, there's my brain going in some way. Anyhow, we look forward to any feedback, any ideas or suggestions, and I will give you any, Andy, as he's logging into his laptop. Any final thoughts before we sign off? Any final thoughts? No, I hope the weather is getting better because it's really cold out there and we need to walk over to the meetup later. And other than that, I think just keep sending us these stories, especially me. Now, this is not my personal interest.

Starting point is 00:56:39 Send me these PurePaths. Use my Share Your PurePath program. Share any other stories. If you are in the area where i am follow me on twitter i typically post where i'm traveling to right look me up and just hunt me down i'm always happy to share a beer and obviously a lot of stories with you but a beer or find me on one of the dance floors i typically go out salsa dancing in the cities if you have any recommendations on salsa places in your city,

Starting point is 00:57:06 let me know. And then we'll find an excuse that I get there. Of course. Yeah. All right. Well, thank you all very much. We'll see you next time.

Starting point is 00:57:16 Goodbye. Goodbye.

Your Ad Here

PurePerformance - 004 Top Java Performance Problems

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.