PurePerformance - Code as a Crime Scene: Diving into Code Forensics, Hotspot and Risk Analysis with Adam Tornhill

Starting point is 00:00:00 It's time for Pure Performance! Get your stopwatches ready, it's time for Pure Performance with Andy Grabner and Brian Wilson. Hello everybody and welcome to another episode of Pure Performance. My name is Brian Wilson and as always, Andy Grabner. Hi Andy. Hey Brian. I hope my voice sounds much better than last time because I remember you said something sounded really off with my voice. How does it look today? It sounds fine. fine what you mean you remember when we did the last recording with oh that's right with adrian it sounded like and i think i sounded like this you were all like this i know and i think i found the uh the root

Starting point is 00:00:56 cause of that issue it was a setting on my on my zoom to uh to lower to make you sound like you had a lot more testosterone flowing through your body exactly testosterone button yeah exactly yeah i think they call it uh just a sampling frequency here but uh yeah well i just gotta say before we go i'm tired today i was up it's what time is it now it's 9 30 a.m my here, but I was up until about 2 o'clock because a friend asked me to go see the new Quentin Tarantino movie, Once Upon a Time in Hollywood. And I was like, well, it starts at 9.45, but I always liked a lot of Tarantino movies,

Starting point is 00:01:36 and this one's getting great reviews. And you know what? For the two hours and 45 minutes, I don't think it's worth that much time. It's an all right movie but i suffered for it but not for good entertainment so unfortunate anyway i just gotta say that so if i seem slow and down today that's why yeah but if you are the good news is we always have an amazing guest that can fill in the void that you may leave behind because you are always such and i always leave such large

Starting point is 00:02:05 voids in the in the episodes yeah yeah so what speaking of our guests why don't we go ahead and why don't you go ahead and introduce our guests and let him introduce himself and yeah get it rolling so um yeah today another another guest that i met in yash in romania at deaf experience a conference i've been invited to speak a couple of months back. And I guess instead of me introducing Adam, I think I'll just let Adam introduce himself because I'm pretty sure Adam knows himself much better than I know him. So Adam, thanks for being on the show, first of all. And yeah, if you wouldn't mind just getting started with explaining a little bit about yourself, what your background is, and then I will explain kind of the topic of today.

Starting point is 00:02:51 Welcome, Adam. Wow. Thanks a lot. And thanks for having me here. So I'm Adam Thornhill. I'm the founder of a company called Empir, where I work with code analysis and I'm developing a tool called CodeScene. So I've been a developer for a long, long time. I've been doing this since the mid-90s. And what might be a little bit different in my background is that I have my degree in psychology, which also happens to be a major interest of mine. And what I try to do these days is to kind of take my psychological perspective and put it on top of my technical background. Okay.

Starting point is 00:03:34 So that's me. That's, and that's actually what, what was so fascinating for me. So I remember, I know you had two sessions at a deaf experience. I was only able to see one because I believe I had my session in parallel, something like that. But I remember you getting on stage. First of all, you are an amazing presenter, very entertaining. And then the topic that you presented and kind of visualizing data. I remember you were showing Git commit history and the analysis you ran on top.

Starting point is 00:04:06 And I said, wow, what is this? Is this CSI? And what is going on? I never thought about analyzing code behavior, code commit behavior, like the way you do it. And I think that's also later on, you know, we spent a little time together and then you explained a little bit more about your background but i found it very fascinating what you especially now also with the tool that you're building can read out of let's say the trail that developers leave behind when they're committing code changes and all that stuff so pretty fascinating oh thanks yeah i'm happy to hear that so uh yeah this is something i've been doing for a long, long time.

Starting point is 00:04:46 And I started out with it like 10 years ago, maybe. At that time, I was in the middle of my psychology studies. And at the same time, I was working full time as a software consultant. So I constantly faced these challenges. Given a large code base, where should we focus our efforts in case we want to improve something and be pretty confident that we get a real return on any time we invest in it? So that's where I started out with it. So that means when you are, you said as a consultant, that means you are typically then brought into projects with an existing code base, taking it over, refactoring it or doing maintenance or

Starting point is 00:05:26 what what was the the initial kind of what would the typical projects look like where you were where you faced these large let's say code bases yes i would say that the typical project was something that was already ongoing it was not necessarily a legacy code base but it was something that has been developed over a couple of years. And my role, I was quite often hired as a technical lead or as an architect. I'm still not quite sure what that is. But one of my responsibilities was always to kind of try to make the development work a little bit more efficient, try to move forward a little bit quicker with new features, that kind of try to make the development work a little bit more efficient, try to move forward a little bit quicker with new features,

Starting point is 00:06:07 that kind of stuff. And that's what, yeah, that always kind of ends up being about identifying and managing technical depth. Which, I mean, I guess technical depth on its own is just a huge, huge topic. So can you tell me a little bit about, I know this is probably hard to fit into an episode here. I know you wrote one or at least multiple books and you have your tool, CodeScene. from somebody that is not listening in, what are the, let's say, the one, two, or three top things

Starting point is 00:06:47 that people should be aware of, that people should do when you approach a new project? What are the first things that you analyze in order to figure out, you know, where's the biggest problem in this code base? Where are the biggest dependencies? Whatever it is,

Starting point is 00:07:02 what are the three things people should be aware of so i think there are three things i can come up with immediately that i always look for myself so the first is that if we start with technical depth so just because some code lacks in quality or is badly written that that doesn't mean it's technical debt. It's only technical debt where we need to pay an interest on it. And I think that was the key lesson for me. So the second thing which made a huge change for me in my ability to tackle large code bases was to kind of came to the understanding that it's impossible to separate the technical side from the people side. And the people side of software is notoriously hard, right? Because the code itself is largely invisible and the

Starting point is 00:08:01 people side even more so, right? We cannot look at a piece of code and know, is this code or coordination bottlenecks for five different teams or is it developed by a single person so we might have a key personnel risk? It's impossible to tell. So waiting in the people side is really, really important. And the third thing I would like to mention is that when doing improvements, if it's improvements in the ways of working or improvements in the code quality that I mentioned, I find it really vital to kind of tie it into some kind of business value. Otherwise, it's going to be really, really hard to pull it off yeah hey adam i wanted to go back to the first point you meant just to get some clarification um for both myself and maybe some listeners who've only had a little bit of dealing with technical debt when you describe

Starting point is 00:08:55 technical debt as um you know things that where you actually have to pay interest can you explain um what that means a little bit more in depth? Bad code isn't necessarily debt, it's code where you have to pay interest on. Yeah, sure. I'd be happy to do it. So maybe I can share one of my traumatic stories from my years as a consultant to try to highlight what I mean. So this was maybe 10 years ago. I was brought in on a project in order to try to improve what they call the delivery efficiency on it. So at that time, I was a heavy user of things like static code analysis. And I'm still a fan of static code analysis.

Starting point is 00:09:40 I still use it on a daily basis. But what I did was basically I ran a static analysis tool on a pretty large code analysis. I still use it on a daily basis. But what I did was basically I ran a static analysis tool on a pretty large code base and that tool pretty quickly identified that the worst possible code, you know, heavy dependencies, low cohesion, lot of conditional logic tended to be located on a particular component. So just to make sure I had found my first refactoring candidate here, I went over and talked to the people who I knew had worked on that part of the code. And they pretty much confirmed my findings that, yeah, this code is a true mess to work with. We don't want to touch it at all if we have a choice. So I thought, wonderful, let's improve this part of the code

Starting point is 00:10:26 and it's going to make wonders for our ability to deliver. So what we did was basically we took our two best developers and let them spend two months rewriting that component. And when they were done, we had some code that looked excellent, was a pleasure to review it. It had very high test coverage. Everything looked just brilliant

Starting point is 00:10:46 and the performance was also good. And what happened next kind of changed how I view software because this organization had very detailed metrics on delivery output and the pace of the development and all that kind of stuff.

Starting point is 00:11:02 And I was expecting wonders in that dimension. However, what happened was that it was no difference at all. And what that meant to me was that we had basically wasted two months for our two best developers doing something that didn't impact the business. And I kind of started to dig into this. Why is that the case? We replaced some bad code with some really good code and we didn't get the business impact. Well, it turned out that that part of the code, yes, it was a true mess, but it was a working mess. It was a mess that had been debugged into functionality and proven in use. And it was rarely, if ever, touched.

Starting point is 00:11:49 So that's when I kind of realized that complexity only matters in a context. If we have a complex piece of code and we never need to touch it, well, we probably have much more urgent matters. So that means if I hear this correctly, just looking at the static code analysis tool that shows you what complexity you have in which part of your code doesn't give you any clear indication if you don't combine it with things like uh how many problems ever occurred in this particular code how often is it changed how i don't know i'm sure there's other metrics too right but you need to so the context is this

Starting point is 00:12:22 what you understand with context like adding additional metrics obviously around just the technical complexity right like the business impact the quality and all that stuff right yes exactly so um static analysis techniques are simple code inspections they are great for identifying uh technical debt or identifying problems in the code, but they are not particularly good at prioritizing them. So for the priority dimension, we would need something else, right? And that's what I try to popularize with my books and presentations to kind of tap into the behavioral data of the organization and look at where do the developers actually do most of the work and use that to prioritize. So you can you give us some a couple of additional hints now so i know you

Starting point is 00:13:09 you covered this in your presentation uh the way you were analyzing as i mentioned earlier git commits and other things but um can you go into a little more detail here uh so what type of data uh when you approach an organization so what what type of data does an organization for instance already have but they're not looking into it or what information do you need for your forensics that typically organizations don't have but they should have so yeah i try to use the simplest possible metrics because those tend to have a pretty good predictive value and they are also easy to explain and they tend to be fairly intuitive once you get used to them.

Starting point is 00:13:55 So the most valuable data source, if you want to do code forensics or if you want to prioritize technical depth, that's a data source that everyone already has. And that's our version control system. So my first step is always to tap into version control because version control is something, I think we as an industry, we have mostly used it as an overly complicated backup system. But then almost as a side effect,

Starting point is 00:14:23 we have built up this wonderful data source that kind of tells the evolution of our system. But then almost as a side effect, we have built up this wonderful data source that kind of tells the evolution of our system. So I tap into that and I start by calculating simple things like the change frequency of each piece of code. How often is a particular piece of code modified? And the reason I'm fascinated by change frequencies is because if you if you plot it in a simple graph you will see that it will form a parallel distribution a pareto distribution which basically means that you know like the 80 20 distribution only tends to be even steeper and what that basically means is that most of your development activity is in a very small part of the code base and most of your code tends to be in the long tail which means it's rarely ever touched so those are the parts of the code where

Starting point is 00:15:11 we can actually live with some code quality issues but the head of the curve the things we work with all the time that's where technical depth starts to become really really expensive because i like to view the change frequency of a piece of code as a proxy for interest on any technical depth that we find there. Does that make sense? It makes a lot of sense. Yeah. And especially so obviously the code that has to be touched a lot and this can, I mean,

Starting point is 00:15:40 I guess we need an additional dimension though, because I guess you need to answer the question, why do people touch a particular piece of the code a lot? Is it because it is really so central, and this is where all the features get added? Or does it have quality issues? Or is it because of bad architecture that everybody needs to touch the code? I guess there's an additional component.

Starting point is 00:16:02 But yes, it completely makes sense. Because if you can make that code base easier it means that the developers in total right if you sum it up the time need to spend less time and and therefore you have a bigger a much bigger impact i guess that's kind of the way i i understand it yeah i think that's a very good summary and much shorter than my explanation. I really like it. This is why we call Andy the summer reader. Yeah, but I think that's really important because those parts of the code with the very high change frequency, that's what I refer to as hotspots in my books and in my tooling.

Starting point is 00:16:46 And I always claim that just because something is a hotspot, that doesn't necessarily mean it's a problem, right? You might pick up a hotspot. It happens occasionally and the code looks pretty good. In that case, you're in a very good position. But on the other hand, even minor quality issues in a hotspot can amplify pretty rapidly due to the high change frequency and become really expensive and hold back the whole organization. Do you have a chance?

Starting point is 00:17:10 So when you look at, let's pick Git as an example, but if you look at the Git history, you can only detect change frequency, but you cannot really detect or calculate how much time actually went into the change. Only if you could combine it with other tools, let's say a tracking tool for tracking tickets for like a JIRA or something like that. Do you do that as well?

Starting point is 00:17:36 So that you combine the frequency with the effort that went into this particular change? Yes, I do those kind of analyzers as well. And many organizations indeed have the data necessary to do it. So basically what I do is I pull in the data from, let's say, Jira, which tends to be fairly heavily used. And I simply check not only how much time is spent on each hotspot, but also what kind of work is done there. And this is something really interesting because when you have a hotspot, you want to find out, all right, is this a lot of activity

Starting point is 00:18:19 because we're implementing new features? Or is the majority of the work bug fixes which might indicate a different kind of issue this is first i'm also curious yeah i'm also curious from for the point of view of identifying a hotspot how much um time and i don't know if i'll frame this well especially not being having ever been a developer. But I'm imagining there's a point where you see a section of code that gets changed so often that that then lends itself to a change in the code, not because bugs or something. But here's a pretty weak example, but I don't know how to qualify this. Let's say you had an API input that took three variables,

Starting point is 00:19:08 and then another team comes along and wants to use that, but they need four, so then you change the code and allow it to accept four. Someone else then needs to accept five, and you keep making these changes, and you're noticing all these changes until somebody finally comes around and says, hey, we're seeing all these changes

Starting point is 00:19:23 because we're putting a hard limit on the amount of inputs into this. We can refactor this to take n number of inputs and then we don't touch the code anymore. Then suddenly that no longer becomes a hotspot because you've refactored the code to handle what the typical request changes come in. Is that a scenario that plays out

Starting point is 00:19:43 or just something that yeah yes it can definitely happen right so what you're referring to just i get it right is basically that some code a hotspot tends to cool down for some reason correct see now you did the perfect summary so yes it happens but i have to have to say it's a relatively rare case. And what I tend to see, and this is like the beauty of virtual control later, is that you immediately have access to the full history of the code base, right? So when I find a hotspot, I'm also looking into the trends. So look at, all right, how rapidly has this hotspot grown?

Starting point is 00:20:24 How quickly has it documented code complexity? And what you tend to see is that if you pick up a system, the modules that are hotspots now, they were most likely an issue already a year ago or maybe two years ago. And I think that's where the people side might come in. Because what tends to happen with hotspots is they also tend to be magnets. They tend to attract many different developers and many contributions from different developers. And that makes it very, very hard to refactor and act upon the information in some scenarios. So it's like hotspots are like a black hole. They are sucking in all the additional resources of people that try to solve the hotspot or just getting attracted to complexity. That's kind of interesting. It could be, right?

Starting point is 00:21:16 Yeah. It could be. I mean, there are, of course, cases where the hotspots look perfectly clean. But more often than not, the reason a piece of code is changed frequently is because it has good reasons to do so. And a typical reason is that it has too many responsibilities. It's slow and cohesion, right? So as a consequence of that, it also tends to attract contributions from many different

Starting point is 00:21:39 developers, potentially different teams working with different features, but all ending up in the same hotspot. So those hotspots tend to become coordination bottlenecks as well. What about and maybe this was touched upon earlier, but what about seeing the correlation between two or three hotspots? Meaning you like the example from from Brian was I thought was pretty good. If you make a change to an API, you may also need to make a change to whoever calls the API. And maybe you have multiple clients of the API.

Starting point is 00:22:12 So you need to make with every change of one API need to change multiple clients. Do you also do that kind of dependency analysis between hotspots? I do something very similar that helps us answer those questions. I do something I call a change coupling. So change coupling isn't a traditional dependency analysis. You know, in a traditional dependency analysis, we look at properties of the code. So we look at which parts of the code depend upon each other, which parts of the code use each other. What I do instead is that I look at the behavior of the code depend upon each other, which parts of the code use each other.

Starting point is 00:22:49 What I do instead is that I look at the behavior of the developers. So I turn to version control and I see when a particular team or an individual developer touches this piece of code, they also tend to modify these other parts of the code. So I kind of uncover the patterns of the developers. And that helps to highlight those change patterns where you you might modify an API, then you go and modify a service and you maybe modify your data model. You can pretty much uncover the change patterns in your code base.

Starting point is 00:23:15 And this is something I used to reason about the cost of change, the change impact. Oh, so that means you can even do predictive things. You can say if you are going to change this, you have to change like that many lines of code in other complex environments and it's going to take you that much time. Is that what you can do too? Yeah, in fact, we have in code scene, we actually have we have a CI CD integration where we can hook into a continuous delivery pipeline.

Starting point is 00:23:42 Yeah. And one of the things we do there is that we have knowledge of this, right? Since we already scanned the code base, we know that this cluster of hotspots, they tend to be changed together, right? So in the CI CD pipeline, we can actually detect omissions so we can find say things like, all right, you modify this particular hotspot. When your colleagues do the same thing, they tend to modify this other piece of code as well.

Starting point is 00:24:06 Did you forget about it? Oh, that's pretty cool. So wouldn't that, so you said you integrate this into your CI, wouldn't that be, are you also integrating that with your IDEs, with the dev tools? Wouldn't that be even cooler?

Starting point is 00:24:19 If you can say, hey, Andy, you're changing this class, don't forget about these other five classes because Adam and Brian have changed them every time when they touch this code. Wouldn't that be a cool integration into the IDE too? It would be a wonderful integration. And if you go back to my previous book,

Starting point is 00:24:39 Your Code is a Crime Scene, which I wrote in 2014, I actually think I have a section where I talk about that integration as a direction for the future, right? And I still remember when I wrote that, that I thought that, all right, this is something I'm going to have in a couple of months. And I'm still not there, but it's on my roadmap. You know what that sounds like? Which kind of highlights the brilliance of it is, I don't know if you all use, I know Andy uses Outlook,

Starting point is 00:25:08 but a lot of times in your email clients, you'll make a reference to a picture or a file or something. And if you go to hit send, it says, hey, did you mean to attach a file? You know, to me, this is the same sort of application of that. We see what you're doing

Starting point is 00:25:24 and we're noticing what other things, you know, you're basically looking at what are the patterns of things that occur when X action is taken and making those suggestions to do those others. So it's just a more complex and more code level reference system to that. I think that's really, really, really cool.

Starting point is 00:25:43 And it also shortens the feedback loop because if I'm checking in code and then the CI runs and it takes 5, 10, 15, 20 minutes and then I get the notification that what I did 20 minutes ago I forgot something, it would be much better if I get immediate feedback in the IDE because that actually changes

Starting point is 00:26:00 my behavior and it teaches me something, right? I mean, that's the thing. But even in the CI-CD pipeline, you have getting that notification can help shorten that time to resolution. That's true. But in the sense of shifting all the way left, in this case, it would actually be possible to go all the way into the IDE

Starting point is 00:26:18 and say, hey, Andy, don't forget to change this file here as well because you're most likely going to change it anyway later on. So we can expect your submission to GitHub on this, Andy, in a month? Yeah, exactly. It is, in fact. I think it's a really important idea because, like you pointed out, I actually got this idea from recommendation engines. I buy a lot of books.

Starting point is 00:26:48 I tend to buy them from Amazon. And they always had this feature there, right? Customers who bought this book also bought this and that book. So that's how I started to think about it. But the reason to plug it into IDEs would be not only as early warning system, but also would be very useful as a code reading tool to familiarize oneself with the code base, right? Because I'm pretty sure that writing code is not the main problem we have in our industry. Our major problem is to understand existing code. And any help we can get there would be a huge win. So that's why I would like to go to the

Starting point is 00:27:26 IDEs to kind of help us navigate an existing code based on how the collective intelligence of our colleagues. So basically what you explained earlier, 10 years ago when you started your project, you walked over, you did your analysis and then you walked over to the folks that actually had let's say tribal knowledge about that code and in your what you're saying you that the tribal knowledge is in our existing tools in our version control in our jiras and we can automatically bring it to every developer that starts with a new code base right that's basically what you're saying yeah that's what i'm

Starting point is 00:28:05 saying and you can get really forward that you won't get perfect information because you will always have a contextual knowledge right of course yeah but you can get pretty far yeah and i think the edge cases i've come across are things like you know you come across a really nasty hot spot and you talk to people who develop it and they tell me that, all right, yeah, it's a problem, but we have a replacement ready. We're going to replace it with this new library in two weeks, right? So in that case, it doesn't make so much sense to focus on it. But in general, I think using these techniques for onboarding is something that has saved me weeks and months throughout my career you can get into a new code base surprisingly fast yeah hey andy here's an idea

Starting point is 00:28:54 i don't know if it's um i think it's a couple couple steps away still but imagine if you had a robust tool monitoring your production environment, which understood all the service dependencies, right? Now, this is not an ad. You find an issue with one of your services. You understand which services make calls into those services so that you can understand the dependent services. You tie that data from those services back to the code dependency from a code scene, right?

Starting point is 00:29:27 So that you can understand when an alert is raised in production, it might be a certain service call, which can then say this is reliant on XYZ services that are making calls into it. And here is all the bits of code that you might have to account for on each of those services if you make a change to improve this code. Basically tying both data sets together into one big, giant, amazing output. Yeah, so basically what you're saying, we have to build an integration between CodeScene and Dynatrace. Basically, yeah.

Starting point is 00:30:00 That would be fantastic, right? So we will pretty soon, in a couple of weeks we're going to start uh opening up our apis so you will have all the data you need perfect now that's really cool because as brian i mean this is not a commercial now necessarily uh but i mean modern monitoring tools obviously have a lot of data especially now those that are doing end-to-end tracing. So we have a lot of dependency information about the problem. And we end up coupling this when we send out the alert where Slack or whatever tool we are going to integrate with and then coupling this information with already code level information

Starting point is 00:30:40 with, you know, enriching the data, enriching the problem data with the data from tools like yours would extremely benefit the team that has to actually then work on the fix. I mean, that's a really cool thought, actually. I like it. Yeah, and I think it fills an important gap as well, right? Because my experience with many modern microservice implementations and our service based implementations in general is that, you know, while each individual service might be fairly easy to understand isolation,

Starting point is 00:31:17 the emerging system behavior is anything but simple. So, yeah, I agree with that. Yeah, cool. So, Adam, I know we mentioned the, uh, the product code scene, so it's, it's, um, we will put out the website link. I think it's www.impear.com. Is that the way you correctly pronounce it? Uh, yes. That's the name of the, of my startup. Uh, so we have, uh, yeah, that's the name of my startup. So we have, yeah, there's actually two code scenes, which might be confusing.

Starting point is 00:31:50 So we have the on-prem version, which you host yourself, and then we have code scene IO, which is cloud-based. So that might be another link where anyone who's interested can try it out for free. Cool. And I think the great thing about products like yours is that you have built this based on your own, I don't want to say suffering now, but basically you suffered in this situation.

Starting point is 00:32:15 You come up with a solution. And then you decided that, hey, this is actually something that is really useful. And let's build a tool that helps you and then helps others as well and this is why you know tools like tools like this just really will have an impact on the way we do software engineering and the way we are we are going to think about software quality and better architectures and um so so the i would assume and again and again, a little commercial I think is great

Starting point is 00:32:45 because you are doing us all a favor by sharing your experience here. So I assume the way CodeScene works, you just tab into the APIs of, let's say, a Git or a Jira. So that means I just need to point it to the tools and then you pull out all the data and then you do your magic yes that's basically it you specify your git url could be bit back at gitlab anything and then press a button and that's basically it cool and then you are doing your hotspot analysis so the stuff we discussed up earlier um and I mean, I just look at the website and I think you had a couple of, I'm not sure if we had how many screenshots I remember from your presentation,

Starting point is 00:33:30 but there were some really interesting things on how you then visualize this forensic data and then how you can drill in. Do you give, does your tool also give recommendations? Because as you said earlier, sometimes you can be misled by data. So I assume you do not only analyze the data, visualize it nicely, but also give specific recommendations on this is what you would do?

Starting point is 00:33:57 Yeah, so that's an area where I know that we can improve. Today we are not giving us, I would like to simplify the recommendations. They're definitely there. If you find a hotspot, I mean, CodeScene will present them to you. CodeScene will rank them to you. You click on it and you can immediately see a list on the major issues that we have found

Starting point is 00:34:21 together with our recommendations. But we could make that even better. So that's one of the things I'm actually working on right now. Very cool. And now this is, I mean, I'm looking at the website right now. There's a lot of cool things here. I see the social patterns you talk about. Obviously, developers can write better code faster together, find the hotspots, the hidden risks. That's also interesting, right? I think risk is something that is just as important because if you're changing, you

Starting point is 00:34:56 can predict the risk, obviously, of a code change, right? Yes, that's correct. That's also one of the main use cases I see for these kind of techniques in continuous integration, right? So what tends to happen in many organizations, particularly large organizations, is that code reviews tend to become bottlenecks. So the idea is that if we can predict the risk of each change set, then you can use that to prioritize code reviews as well.

Starting point is 00:35:25 Right. So you have a high risk change. You probably want to put two people on verifying and inspecting that. And then you can have lower risks that you can roll through a little bit more quickly so that you know that you use your time wisely in the code review process. I didn't even think about that. That's even, that's awesome. Yeah, because you're right. As you scale your development organization, and if you're enforcing code reviews on everything, and if you don't have any differentiation between what's the priority or what's the risk with a code change, and you always assign the same number of people, you won't be able to scale because you're spending too much time in, let's say, quote unquote, less important code reviews versus more important ones. And that's, I need to write this down. This is really cool. Yeah. And I also think that there's a limit to how much code we can review each day, right?

Starting point is 00:36:20 So after a while, we will start to slip and there might be some serious issues that just go through because we're not paying attention. We are too tired. This is really cool. Andy's taking notes. No, I'm really taking notes. Yeah, I know. I'm wondering when the blog will come out. so this is this is Adam and this is one of the great side effects

Starting point is 00:36:45 of our podcast is we get to talking and it gives Andy a bunch of great ideas to go you know off and run

Starting point is 00:36:52 and play with and explore and it's always great to hear when Andy gets sparked or inspired but whatever I do

Starting point is 00:37:00 with it obviously I give you all the credit I'm not going to be somebody that just takes an idea from somebody and sells it as my own, but no, this is really cool. Um, Adam, I have, um, so I want to jump to, you made an interesting comment in the very beginning when you explained, I asked you for three things. You said technical debt,

Starting point is 00:37:22 you said a separate technical side from the people side. And then the last thing you said was improvements only tie them to the business value. And obviously, I believe this is what resonates a lot with decision makers, right, in the end. So, you know, what do we really improve. Can you give us a quick idea on how can you quantify the business value of a change just by looking at the data that you have right now? Is there any, or is there additional data we need? Is there additional input people need to tie an improvement or a change to the business value? Yes. So I, my experience, because I'm often in that situation that I'm potentially the most frequent codes in user in the world, right?

Starting point is 00:38:12 I use it every single day. I do services around it stuff, right? So I'm often in that situation where I have to explain something deeply technical, code quality issues, technical depth, and I have to explain it to non-technical managers. So one thing I've found that works pretty well is to show the hotspot visualizations that we do, because those are almost like a map of your system, and that kind of shows the importance of your different parts of the code. And if you combine that with the trends for each hotspot, so you can see the evolution of properties like complexity within the hotspot, number of developers have to work on it, then it's usually a pretty easy sell to non-technical managers.

Starting point is 00:38:55 They tend to realize that, all right, we really need to act upon this, right? And to me, that has been one of the most important use cases because now you can suddenly tie together our deeply technical, usually inaccessible word of code with the business people. But I do think that to really have something that you can measure in time and money, I think you have to go for things like cycle times and lead times. So you're most likely aware of Accelerate that came out last year, right? Yeah. So I think that book has been, I think it's one of the most important books over the past years. Because they're actually showing this causation between the different cycle times and the business values like increased profitability, customer satisfaction, all that good stuff that business people want to have.

Starting point is 00:39:51 Right. So if you could set the baseline that these are our cycle times, let's say this is what the time it takes from a customer detects and reports a defect until we have it fixed in production, right? That would be a really good measure. And see, can we cut that in half by paying off the technical debt in these critical parts of the application? Yeah, so it's in the end.

Starting point is 00:40:19 So the business value in this case is really about efficiency, time to market, time to remediate we call it i think i have two two metrics that i just used in some of the work that i've done is mtti and mttr so mtti means the mean time to innovate so how long does it take us to innovate versus spending time on doing as you said you know bug fixing and working on technical debt, so MTTI and then also MTTR, meantime to remediate. These are exactly two of the metrics that are very easily understood by the business.

Starting point is 00:40:56 And if you can then show with your analysis how to give developers more time to innovate, that's one great thing, and also how to improve code quality in order for developers to react to problems faster. Then this is, yeah, it makes a lot of sense. Okay, very cool. Very nice. And just for reference, the book Accelerate you mentioned,

Starting point is 00:41:19 I'm assuming that is the science of lean software and DevOps by Nicole, I hope i get this name right for for forsgren uh jess humble and gene kim is that the uh one you're referencing yes that's correct right so for anyone listening that has been officially dubbed now it's the most important book yeah we had we had gene kim on the podcast. How was it? Two years ago? It was a while ago. Two years ago. And he was talking about the DevOps Handbook.

Starting point is 00:41:50 That was pretty cool. And yeah, Nicole, she's with Dora, right? She's been doing all the DevOps, state of the DevOps report analysis, and she's been driving that. And yeah, the findings are fascinating, what they came come up with or came up with and definitely helps us to better communicate to the business how important it is

Starting point is 00:42:11 to invest in quality in automation in working on technical debt yeah and what what I really admire about their work is that they actually managed to put some real science on software. And I think this is something that we do have a research community in software. We do know things about software. We do have data to show it, but there tends to be a huge gap between research and practice and accelerate, I think, serves as at least a bridge for some aspects of software that turn out to be really important. Very cool. Adam, is there anything else that a listener should know?

Starting point is 00:42:57 Or did we cover at least the basics of what people need to know when it comes to leveraging and harvesting the data that we have lying around, but not harvest correctly in order to figure out where are our hotspots, you know, using it for risk analysis, as you said. Is there anything else listeners should know before we leave them go? And then hopefully they explore code scene your blogs your books yes so i think there's that's one really important thing that i really like to add and that is that all that kind of data we have been talking about all that kind of tooling and visualizations they never stand on their own so they need something as a compliment that that compliment is you dear listener.

Starting point is 00:43:53 So these techniques are there to kind of help you focus your attention and expertise to the parts of the code that are likely to need it the most. But the decisions and actions are always going to be yours. So I think that's important to point out. That's true. Let's say, what's it called? It's an action call, a call to action. Call to action, yeah. Exactly.

Starting point is 00:44:15 So speaking of calls to action, should we call the Summaryator to action? I think we should. Absolutely, yeah. Let's do it. All right. Well, I've been fascinated when I listened to you in Romania, where you kind of opened up my eyes on what we can do with data that I didn't even know all of us have. The rich history of the Git commit history, data that we have in our change management system, in our version control, in our ticketing system. So what I learned today, and hopefully everybody learned today, is that detecting hotspots is important, but like what you said earlier, just detecting a hotspot purely alone on, let's say, technical complexity

Starting point is 00:44:57 is not enough. You need to give it context. So I really liked what you did with the change frequency, so figuring out obviously what changes a lot but then correlating this with how much time is spent on actually for developers to do things are these bug fixes, is this new development so give it more context to data

Starting point is 00:45:18 because just technical complexity based on static code analysis tool doesn't really tell you whether it is a smart move to change that code if it really doesn't have a whole lot of impact in the end. I also really liked the stuff when you said the behavior of your team members. So if somebody changes something here,

Starting point is 00:45:43 then most likely you're changing something in another part of the code. So kind of seeing team behaviors, using this also for recommendations in the future. We have a feature requirement here for you, putting it into the IDE. And then I think the biggest thing in the end, though, is you will probably, unless you have a lot of energy and you can do this as a grassroots movement from a technical side, you should most likely first go to the business and show them what can be improved in terms of becoming a better organization, delivering better quality, reducing cycle time, improving efficiency. So selling to the business first is a big thing. And if people want to try out the stuff that you've been writing about, the stuff that you just heard, go to empire.com, check out CodeScene as one of the products out there.

Starting point is 00:46:39 And then if everyone ever has a chance to see you live on stage, I can just encourage everyone it's a it's a was a really precious pleasure seeing you there and learned a lot of stuff thank you so much thanks a lot for your kind words and uh thanks a lot for having me here i really appreciate it thanks hey andy for once i had a couple of thoughts too. I want to throw out there. Of course. No, sometimes I do. First of all, everyone needs to realize that you're surrounded by information or at least data, right? And some of it you may be aware of that's out there.

Starting point is 00:47:18 Some of it you may just look at as noise. But I think it's always worth looking at what you have access to because what Adam and team and all did is, you know, there's this wealth of information sitting in GitHub and sitting in other tools where if you take a look at it and really start thinking, what can you, can you use that and spending the time to see what you have available and how you might be able to apply it, you might be sitting on a goldmine, not just financially by saying, hey, we're going to start a company, but also for a goldmine of information to help you improve your processes. And if you then use that information to improve your processes, you can free up more time to take a look at what's available to do this. So it's a cycle that feeds into itself. So don't ignore all the different various data points that you have access to. Obviously, you don't want to use them unnecessarily, but see if you can bring value to them. The other thing I wanted to just kind of bring up was kind of more like excitement for the near future,

Starting point is 00:48:27 hopefully near future, I hope. When we take a look at tools like CodeScene, Dynatrace, I want to also reference back to Akimis and Probit. So Akimis is the one, Andy, if you recall, where they're going to use some AI engines to tweak, like say your JVM settings to get performance improvements. Probit is the company that's taking ROM data and generating Selenium scripts and possibly in the future, maybe some load testing type scripts. What I'm really excited to see is all tools now are being developed with this accessibility in mind, this API accessibility, this ability to get data out so that others can ingest it. And just thinking in hopefully the near future, taking all these data points, as we mentioned earlier, the idea of marrying some of the Dynatrace data with the code scene data. But also when an issue occurs, maybe you plug that into Akam,

Starting point is 00:49:26 you know, you have the Akamas context as well to see, okay, there's an issue, code performance was poor, but is that because of the code or is it because of a setting on the JVM or the container, whatever might be in there, the container, the OS, and just the marriage of all these tools, right?

Starting point is 00:49:42 They're all separate companies, but there's no thing that prevents, because of the openness of it all, there's nothing that prevents everyone from using the data in a cooperative way. And I'm really excited to see how this might really play out, not just in theory, but in practice in the future.

Starting point is 00:50:00 And it's because of all these wonderful tools people are building that we might see some of this. Mm-hmm. And maybe as a last thought for adam while obviously it would be great to have an integration with dynatrace you should look into the work around open telemetry which is the open source so the open standard around tracing and monitoring with Anatoly is also part of. But if you are really intending on building integrations in the future, then you may definitely want to look into open telemetry. And I know, Brian, I did the recording with Sonia and with Danielle on open telemetry and also talking about open tracing trace context so the recording is out there and i can i think it just makes a lot of sense to also broaden kind of your

Starting point is 00:50:55 scope of what data you can ingest by you know adhering to these standards and looking at what open telemetry can give you. Hey Adam, last thing. Do you do social media? Do you have anything else you want to promote? Do you do any appearances coming up in the fall that you're aware of that you might want to share? Yeah, sure. So

Starting point is 00:51:17 you will find me on Twitter as Adam Thornhill. I'm blogging a lot at Empire.com and I also have my personal blog at AdamTornhill. I'm blogging a lot at empire.com and I also have my personal blog at adamthornhill.com. And yes, I'm speaking at several conferences this fall.

Starting point is 00:51:35 I will be keynoting Eurodev in Malmö, Sweden. I will also go to Copenhagen and a bunch of other conferences. So I hope to see some of you around there. Awesome. Well, thank you very much for being on the show, Adam. We look forward to seeing what else you come out with and we'll be sure to be following you.

Starting point is 00:51:56 Thanks. Thank you. Bye-bye. Yeah, thanks a lot for having me here. hair.

PurePerformance - Code as a Crime Scene: Diving into Code Forensics, Hotspot and Risk Analysis with Adam Tornhill

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.