PurePerformance - Code as a Crime Scene: Diving into Code Forensics, Hotspot and Risk Analysis with Adam Tornhill
Episode Date: September 30, 2019Are you analyzing the dependency between change frequency, technical complexity, and growth and length of code change hotspots? You should as it helps you with tackling technical debt and risk assessm...ent the right way!In this podcast Adam Tornhill (@AdamTornhill) explains how he is applying data science and forensic approaches on data we all have in our organization such as: GIT commit history, ticket stats, static & dynamic code analysis, monitoring data … He is giving us insights in detecting code hotspots and how we can use leverage this data in areas such as risk assessment, the social side of code changes as well as explaining to business why working on technical debt is going to improve time to market.Also make sure to check out CodeScene: A powerful visualization tool that uses Predictive Analytics to find social patterns and hidden risks in your codeAdam Tornhill on Twitterhttps://twitter.com/AdamTornhillAdam Tornhill's bloghttps://www.adamtornhill.com/CodeScenehttps://www.empear.com
Transcript
Discussion (0)
It's time for Pure Performance!
Get your stopwatches ready, it's time for Pure Performance with Andy Grabner and Brian Wilson.
Hello everybody and welcome to another episode of Pure Performance.
My name is Brian Wilson and as always, Andy Grabner. Hi Andy.
Hey Brian. I hope my voice sounds much better than last time because I remember you said something sounded really off with my voice. How does it look today?
It sounds fine. fine what you mean
you remember when we did the last recording with oh that's right with adrian it sounded like and
i think i sounded like this you were all like this i know and i think i found the uh the root
cause of that issue it was a setting on my on my zoom to uh to lower to make you sound like
you had a lot more testosterone flowing through your
body exactly testosterone button yeah exactly yeah i think they call it uh just a sampling
frequency here but uh yeah well i just gotta say before we go i'm tired today i was up it's what
time is it now it's 9 30 a.m my here, but I was up until about 2 o'clock
because a friend asked me to go see the new Quentin Tarantino movie,
Once Upon a Time in Hollywood.
And I was like, well, it starts at 9.45, but I always liked a lot of Tarantino movies,
and this one's getting great reviews.
And you know what?
For the two hours and 45 minutes, I don't think it's worth that much time.
It's an all right movie but i
suffered for it but not for good entertainment so unfortunate anyway i just gotta say that so
if i seem slow and down today that's why yeah but if you are the good news is we always have
an amazing guest that can fill in the void that you may leave behind because you are always such
and i always leave such large
voids in the in the episodes yeah yeah so what speaking of our guests why don't we go ahead and
why don't you go ahead and introduce our guests and let him introduce himself and yeah get it
rolling so um yeah today another another guest that i met in yash in romania at deaf experience
a conference i've been invited to speak a couple of months back.
And I guess instead of me introducing Adam, I think I'll just let Adam introduce himself
because I'm pretty sure Adam knows himself much better than I know him. So Adam, thanks for being
on the show, first of all. And yeah, if you wouldn't mind just getting started with explaining
a little bit about yourself, what your background is, and then I will explain kind of the topic of today.
Welcome, Adam.
Wow. Thanks a lot. And thanks for having me here.
So I'm Adam Thornhill. I'm the founder of a company called Empir, where I work with code analysis and I'm developing a tool called CodeScene.
So I've been a developer for a long, long time.
I've been doing this since the mid-90s.
And what might be a little bit different in my background is that I have my degree in psychology, which also happens to be a major interest of mine.
And what I try to do these days is to kind of take my psychological perspective and put it on top of my technical background.
Okay.
So that's me.
That's, and that's actually what, what was so fascinating for me.
So I remember, I know you had two sessions at a deaf experience.
I was only able to see one because I believe I had my session in parallel, something like that.
But I remember you getting on stage.
First of all, you are an amazing presenter, very entertaining.
And then the topic that you presented and kind of visualizing data.
I remember you were showing Git commit history and the analysis you ran on top.
And I said, wow, what is this?
Is this CSI?
And what is going on?
I never thought about analyzing code behavior, code commit behavior, like the way you do it.
And I think that's also later on, you know, we spent a little time together and then you explained a little bit more about your background but i found it very fascinating what you especially now also with the tool that you're
building can read out of let's say the trail that developers leave behind when they're committing
code changes and all that stuff so pretty fascinating oh thanks yeah i'm happy to hear
that so uh yeah this is something i've been doing for a long, long time.
And I started out with it like 10 years ago, maybe. At that time, I was in the middle of my
psychology studies. And at the same time, I was working full time as a software consultant.
So I constantly faced these challenges. Given a large code base, where should we focus our efforts in case we want to improve something
and be pretty confident that we get a real return on any time we invest in it?
So that's where I started out with it.
So that means when you are, you said as a consultant,
that means you are typically then brought into projects with an existing code base,
taking it over, refactoring it or doing maintenance or
what what was the the initial kind of what would the typical projects look like where you were
where you faced these large let's say code bases yes i would say that the typical project was
something that was already ongoing it was not necessarily a legacy code base but it was
something that has been developed over a couple of years.
And my role, I was quite often hired as a technical lead or as an architect.
I'm still not quite sure what that is.
But one of my responsibilities was always to kind of try to make the development work a little bit more efficient,
try to move forward a little bit quicker with new features, that kind of try to make the development work a little bit more efficient, try to move forward a little bit quicker with new features,
that kind of stuff.
And that's what, yeah, that always kind of ends up
being about identifying and managing technical depth.
Which, I mean, I guess technical depth on its own
is just a huge, huge topic.
So can you tell me a little bit about, I know this is probably hard to fit into an episode here.
I know you wrote one or at least multiple books and you have your tool, CodeScene. from somebody that is not listening in, what are the, let's say,
the one, two, or three top things
that people should be aware of,
that people should do
when you approach a new project?
What are the first things that you analyze
in order to figure out, you know,
where's the biggest problem in this code base?
Where are the biggest dependencies?
Whatever it is,
what are the three things people should be aware of so i think there are three things i can come up with immediately that i
always look for myself so the first is that if we start with technical depth so just because some
code lacks in quality or is badly written that that doesn't mean it's technical debt.
It's only technical debt where we need to pay an interest on it.
And I think that was the key lesson for me.
So the second thing which made a huge change for me in my ability to tackle large code bases was to kind of came to the understanding
that it's impossible to separate the technical side from the people side. And the people side
of software is notoriously hard, right? Because the code itself is largely invisible and the
people side even more so, right? We cannot look at a piece of code and know,
is this code or coordination bottlenecks for five different teams or is it developed by a single person so we might have a key personnel risk?
It's impossible to tell.
So waiting in the people side is really, really important.
And the third thing I would like to mention is that when doing improvements, if it's improvements in the ways of working or improvements in the code quality that I mentioned, I find it really vital to kind of tie it into some kind of business value.
Otherwise, it's going to be really, really hard to pull it off yeah hey adam i wanted to go back to
the first point you meant just to get some clarification um for both myself and maybe
some listeners who've only had a little bit of dealing with technical debt when you describe
technical debt as um you know things that where you actually have to pay interest can you explain
um what that means a little bit more in depth? Bad code isn't necessarily debt, it's
code where you have to pay interest on. Yeah, sure. I'd be happy to do it. So
maybe I can share one of my traumatic stories from my years as a consultant to try to highlight
what I mean. So this was maybe 10 years ago.
I was brought in on a project in order to try to improve what they call the delivery efficiency on it.
So at that time, I was a heavy user of things like static code analysis.
And I'm still a fan of static code analysis.
I still use it on a daily basis.
But what I did was basically I ran a static analysis tool on a pretty large code analysis. I still use it on a daily basis. But what I did was basically I ran a static
analysis tool on a pretty large code base and that tool pretty quickly identified that
the worst possible code, you know, heavy dependencies, low cohesion, lot of conditional logic tended
to be located on a particular component. So just to make sure I had found my first refactoring candidate here,
I went over and talked to the people who I knew had worked on that part of the code.
And they pretty much confirmed my findings that, yeah, this code is a true mess to work with.
We don't want to touch it at all if we have a choice. So I thought, wonderful, let's improve this part of the code
and it's going to make wonders for our ability to deliver.
So what we did was basically we took our two best developers
and let them spend two months rewriting that component.
And when they were done,
we had some code that looked excellent,
was a pleasure to review it.
It had very high test coverage.
Everything looked just brilliant
and the performance was also good.
And what happened next
kind of changed how I view software
because this organization
had very detailed metrics
on delivery output
and the pace of the development
and all that kind of stuff.
And I was expecting wonders
in that dimension.
However, what happened was that it was no difference at all. And what that meant to me
was that we had basically wasted two months for our two best developers doing something that
didn't impact the business. And I kind of started to dig into this. Why is that the case? We replaced some
bad code with some really good code and we didn't get the business impact. Well, it turned
out that that part of the code, yes, it was a true mess, but it was a working mess. It
was a mess that had been debugged into functionality and proven in use. And it was rarely, if ever, touched.
So that's when I kind of realized that complexity only matters in a context.
If we have a complex piece of code and we never need to touch it,
well, we probably have much more urgent matters.
So that means if I hear this correctly,
just looking at the static code analysis tool that shows you what complexity you have in
which part of your code doesn't give you any clear indication if you don't combine it with things
like uh how many problems ever occurred in this particular code how often is it changed how i
don't know i'm sure there's other metrics too right but you need to so the context is this
what you understand with context like adding additional metrics obviously around just the technical complexity right like the business
impact the quality and all that stuff right yes exactly so um static analysis techniques are
simple code inspections they are great for identifying uh technical debt or identifying problems in the code, but they are not particularly
good at prioritizing them.
So for the priority dimension, we would need something else, right?
And that's what I try to popularize with my books and presentations to kind of tap into
the behavioral data of the organization and look at where do the developers actually do
most of the work and use that to prioritize. So you can you give us some a couple of additional hints now so i know you
you covered this in your presentation uh the way you were analyzing as i mentioned earlier git
commits and other things but um can you go into a little more detail here uh so what type of data
uh when you approach an organization so what
what type of data does an organization for instance already have but they're not looking
into it or what information do you need for your forensics that typically organizations don't have
but they should have so yeah i try to use the simplest possible metrics because those tend to have a pretty
good predictive value and they are also easy to explain and they tend to be fairly intuitive
once you get used to them.
So the most valuable data source, if you want to do code forensics or if you want to prioritize
technical depth, that's a data source that everyone already has.
And that's our version control system.
So my first step is always to tap into version control
because version control is something,
I think we as an industry,
we have mostly used it as an overly complicated backup system.
But then almost as a side effect,
we have built up this wonderful data source that kind of tells the evolution of our system. But then almost as a side effect, we have built up this wonderful data source that kind of
tells the evolution of our system. So I tap into that and I start by calculating simple things like
the change frequency of each piece of code. How often is a particular piece of code modified?
And the reason I'm fascinated by change frequencies is because if you if you plot it in a simple graph you will see that it will
form a parallel distribution a pareto distribution which basically means that you know like the 80 20
distribution only tends to be even steeper and what that basically means is that most of your
development activity is in a very small part of the code base and most of your code tends to be
in the long tail which means it's rarely ever touched so those are the parts of the code where
we can actually live with some code quality issues but the head of the curve the things we work with
all the time that's where technical depth starts to become really really expensive because i like
to view the change frequency of a piece of code
as a proxy for interest on any technical depth that we find there.
Does that make sense?
It makes a lot of sense.
Yeah.
And especially so obviously the code that has to be touched a lot and this can, I mean,
I guess we need an additional dimension though, because I guess you need to answer the question,
why do people touch a particular piece of the code a lot?
Is it because it is really so central,
and this is where all the features get added?
Or does it have quality issues?
Or is it because of bad architecture
that everybody needs to touch the code?
I guess there's an additional component.
But yes, it completely makes sense.
Because if you can make that code base easier it means that the developers in total right if you sum it up the
time need to spend less time and and therefore you have a bigger a much bigger impact i guess
that's kind of the way i i understand it yeah i think that's a very good summary and much
shorter than my explanation.
I really like it.
This is why we call Andy the summer reader.
Yeah, but I think that's really important because those parts of the code with the very high change frequency, that's what I refer to as hotspots in my books and in my tooling.
And I always claim that just because something is a hotspot,
that doesn't necessarily mean it's a problem, right?
You might pick up a hotspot.
It happens occasionally and the code looks pretty good.
In that case, you're in a very good position.
But on the other hand, even minor quality issues in a hotspot can amplify pretty rapidly due to the high change frequency
and become really expensive and hold back the whole organization.
Do you have a chance?
So when you look at, let's pick Git as an example,
but if you look at the Git history,
you can only detect change frequency,
but you cannot really detect or calculate
how much time actually went into the change.
Only if you could combine it with other tools,
let's say a tracking tool for tracking tickets for like a JIRA or something like that.
Do you do that as well?
So that you combine the frequency with the effort that went into this particular change?
Yes, I do those kind of analyzers as well.
And many organizations indeed have the data necessary to do it.
So basically what I do is I pull in the data from, let's say, Jira,
which tends to be fairly heavily used. And I simply check not only how much time is spent on each hotspot,
but also what kind of work is done there.
And this is something really interesting because when you have a hotspot,
you want to find out, all right, is this a lot of activity
because we're implementing new features?
Or is the majority of the work bug fixes which might
indicate a different kind of issue this is first i'm also curious yeah i'm also curious from for
the point of view of identifying a hotspot how much um time and i don't know if i'll frame this
well especially not being having ever been a developer. But I'm imagining
there's a point where you see a section of code that gets changed so often that that then lends
itself to a change in the code, not because bugs or something. But here's a pretty weak example,
but I don't know how to qualify this. Let's say you had an API input that took three variables,
and then another team comes along and wants to use that,
but they need four, so then you change the code
and allow it to accept four.
Someone else then needs to accept five,
and you keep making these changes,
and you're noticing all these changes
until somebody finally comes around and says,
hey, we're seeing all these changes
because we're putting a hard limit
on the amount of inputs into this.
We can refactor this to take n number of inputs
and then we don't touch the code anymore.
Then suddenly that no longer becomes a hotspot
because you've refactored the code
to handle what the typical request changes come in.
Is that a scenario that plays out
or just something that
yeah yes it can definitely happen right so what you're referring to just i get it right is
basically that some code a hotspot tends to cool down for some reason correct see now you did the
perfect summary so yes it happens but i have to have to say it's a relatively rare case.
And what I tend to see, and this is like the beauty of virtual control later,
is that you immediately have access to the full history of the code base, right?
So when I find a hotspot, I'm also looking into the trends.
So look at, all right, how rapidly has this hotspot grown?
How quickly has it documented code complexity?
And what you tend to see is that if you pick up a system, the modules that are hotspots now, they were most likely an issue already a year ago or maybe two years ago.
And I think that's where the people side might come in.
Because what tends to happen with hotspots is they also tend to be magnets. They tend to attract many different developers and many contributions from different developers.
And that makes it very, very hard to refactor and act upon the information in some scenarios.
So it's like hotspots are like a black hole. They are sucking in all the additional resources of people that try to solve the hotspot or just getting attracted to complexity.
That's kind of interesting.
It could be, right?
Yeah.
It could be.
I mean, there are, of course, cases where the hotspots look perfectly clean.
But more often than not, the reason a piece of code is changed
frequently is because it has good reasons to do so.
And a typical reason is that it has too many responsibilities.
It's slow and cohesion, right?
So as a consequence of that, it also tends to attract contributions from many different
developers, potentially different teams working with different features, but all ending up
in the same hotspot.
So those hotspots tend to become coordination bottlenecks as well.
What about and maybe this was touched upon earlier, but what about seeing the
correlation between two or three hotspots?
Meaning you like the example from from Brian was I thought was pretty good.
If you make a change to an API, you may also need to make a change to whoever calls the API.
And maybe you have multiple clients of the API.
So you need to make with every change of one API need to change multiple clients.
Do you also do that kind of dependency analysis between hotspots?
I do something very similar that helps us answer those questions.
I do something I call a change coupling.
So change coupling isn't a traditional dependency analysis.
You know, in a traditional dependency analysis, we look at properties of the code.
So we look at which parts of the code depend upon each other, which parts of the code use
each other. What I do instead is that I look at the behavior of the code depend upon each other, which parts of the code use each other.
What I do instead is that I look at the behavior of the developers.
So I turn to version control and I see when a particular team or an individual developer touches this piece of code,
they also tend to modify these other parts of the code.
So I kind of uncover the patterns of the developers.
And that helps to highlight those change patterns where you
you might modify an API, then you go and modify a service
and you maybe modify your data model.
You can pretty much uncover the change patterns in your code base.
And this is something I used to reason about the cost of change, the change impact.
Oh, so that means you can even do predictive things.
You can say if you are going to change this, you have to change
like that many lines of code in other complex environments
and it's going to take you that much time.
Is that what you can do too?
Yeah, in fact, we have in code scene, we actually have
we have a CI CD integration where we can hook into a continuous delivery pipeline.
Yeah.
And one of the things we do there is that we have knowledge of this, right?
Since we already scanned the code base, we know that
this cluster of hotspots, they tend to be changed together, right?
So in the CI CD pipeline, we can actually detect omissions
so we can find say things like, all right, you modify this particular hotspot.
When your colleagues do the same thing,
they tend to modify this other piece of code as well.
Did you forget about it?
Oh, that's pretty cool.
So wouldn't that,
so you said you integrate this into your CI,
wouldn't that be,
are you also integrating that with your IDEs,
with the dev tools?
Wouldn't that be even cooler?
If you can say, hey, Andy,
you're changing this class,
don't forget about these other five classes
because Adam and Brian have changed them
every time when they touch this code.
Wouldn't that be a cool integration into the IDE too?
It would be a wonderful integration.
And if you go back to my previous book,
Your Code is a Crime Scene,
which I wrote in 2014,
I actually think I have a section where I talk about that integration as
a direction for the future, right? And I still remember when I wrote that, that I thought that,
all right, this is something I'm going to have in a couple of months. And I'm still not there,
but it's on my roadmap. You know what that sounds like?
Which kind of highlights the brilliance of it is, I don't know if you all use,
I know Andy uses Outlook,
but a lot of times in your email clients,
you'll make a reference to a picture
or a file or something.
And if you go to hit send,
it says, hey, did you mean to attach a file?
You know, to me,
this is the same sort of application of that.
We see what you're doing
and we're noticing what other things,
you know, you're basically looking at
what are the patterns of things that occur
when X action is taken
and making those suggestions to do those others.
So it's just a more complex
and more code level reference system to that.
I think that's really, really, really cool.
And it also shortens the feedback loop because if
I'm checking in code and then
the CI runs and it takes 5, 10, 15,
20 minutes and then I get the notification
that what I did 20 minutes ago
I forgot something, it would be
much better if I get immediate
feedback in the IDE because that actually changes
my behavior and it teaches me
something, right? I mean, that's the
thing.
But even in the CI-CD pipeline, you have getting that notification
can help shorten that time to resolution.
That's true.
But in the sense of shifting all the way left,
in this case, it would actually be possible to go all the way into the IDE
and say, hey, Andy, don't forget to change this file here as well
because you're most likely going to change it anyway later on.
So we can expect your submission to GitHub on this, Andy, in a month?
Yeah, exactly.
It is, in fact.
I think it's a really important idea because, like you pointed out,
I actually got this idea from recommendation engines.
I buy a lot of books.
I tend to buy them from Amazon.
And they always had this feature there, right?
Customers who bought this book also bought this and that book.
So that's how I started to think about it.
But the reason to plug it into IDEs would be not only as early warning system, but also would be very useful as a code reading tool to familiarize oneself with the code base, right?
Because I'm pretty sure that writing code is not the main problem we have in our industry.
Our major problem is to understand existing code.
And any help we can get there would be a huge win. So that's why I would like to go to the
IDEs to kind of help us navigate an existing code based on
how the collective intelligence of our colleagues.
So basically what you explained earlier,
10 years ago when you started your project, you walked over, you did your analysis and
then you walked over to the folks that actually had let's say tribal knowledge about that code and in your what you're saying you that the tribal knowledge is
in our existing tools in our version control in our jiras and we can automatically bring it to
every developer that starts with a new code base right that's basically what you're saying
yeah that's what i'm
saying and you can get really forward that you won't get perfect information because you will
always have a contextual knowledge right of course yeah but you can get pretty far yeah and i think
the edge cases i've come across are things like you know you come across a really nasty hot spot
and you talk to people who develop it and they tell me that, all right, yeah, it's a problem, but we have a replacement ready.
We're going to replace it with this new library in two weeks, right?
So in that case, it doesn't make so much sense to focus on it.
But in general, I think using these techniques for onboarding is something that has saved me weeks and months throughout my
career you can get into a new code base surprisingly fast yeah hey andy here's an idea
i don't know if it's um i think it's a couple couple steps away still but imagine if you had
a robust tool monitoring your production environment, which understood all the service
dependencies, right?
Now, this is not an ad.
You find an issue with one of your services.
You understand which services make calls into those services so that you can understand
the dependent services.
You tie that data from those services back to the code dependency from a code scene, right?
So that you can understand when an alert is raised in production, it might be a certain service call,
which can then say this is reliant on XYZ services that are making calls into it.
And here is all the bits of code that you might have to account for on each of those services if you make a change to improve this code.
Basically tying both data sets together
into one big, giant, amazing output.
Yeah, so basically what you're saying,
we have to build an integration between CodeScene and Dynatrace.
Basically, yeah.
That would be fantastic, right?
So we will pretty soon, in a couple of weeks we're going to
start uh opening up our apis so you will have all the data you need perfect now that's really cool
because as brian i mean this is not a commercial now necessarily uh but i mean modern monitoring
tools obviously have a lot of data especially now those that are doing end-to-end tracing. So we have a lot of dependency information about the problem.
And we end up coupling this when we send out the alert
where Slack or whatever tool we are going to integrate with
and then coupling this information with already code level information
with, you know, enriching the data, enriching the problem data
with the data from tools like yours would
extremely benefit the team that has to actually then work on the fix.
I mean, that's a really cool thought, actually.
I like it.
Yeah, and I think it fills an important gap as well, right? Because my experience with
many modern microservice implementations and our service based implementations in general is that,
you know, while each individual service might be fairly easy to understand isolation,
the emerging system behavior is anything but simple.
So, yeah, I agree with that.
Yeah, cool.
So, Adam, I know we mentioned the, uh, the product code scene, so it's,
it's, um, we will put out the website link. I think it's www.impear.com. Is that the way
you correctly pronounce it?
Uh, yes. That's the name of the, of my startup. Uh, so we have, uh, yeah, that's the name of my startup.
So we have, yeah, there's actually two code scenes, which might be confusing.
So we have the on-prem version, which you host yourself, and then we have code scene
IO, which is cloud-based.
So that might be another link where anyone who's interested can try it out for free.
Cool.
And I think the great thing about products like yours
is that you have built this based on your own,
I don't want to say suffering now,
but basically you suffered in this situation.
You come up with a solution.
And then you decided that, hey, this is actually
something that is really useful.
And let's build a tool that helps you and then
helps others as well
and this is why you know tools like tools like this just really will have an impact on the way
we do software engineering and the way we are we are going to think about software quality and
better architectures and um so so the i would assume and again and again, a little commercial I think is great
because you are doing us all a favor by sharing your experience here.
So I assume the way CodeScene works, you just tab into the APIs of, let's say, a Git or a Jira.
So that means I just need to point it to the tools and then you pull out all the data and then you do your magic yes that's basically it you specify your git url could be bit back at gitlab anything
and then press a button and that's basically it cool and then you are doing your hotspot analysis
so the stuff we discussed up earlier um and I mean, I just look at the website
and I think you had a couple of,
I'm not sure if we had how many screenshots
I remember from your presentation,
but there were some really interesting things
on how you then visualize this forensic data
and then how you can drill in.
Do you give, does your tool also give recommendations?
Because as you said earlier,
sometimes you can be misled by data.
So I assume you do not only analyze the data, visualize it nicely,
but also give specific recommendations on this is what you would do?
Yeah, so that's an area where I know that we can improve.
Today we are not giving us,
I would like to simplify the recommendations.
They're definitely there.
If you find a hotspot,
I mean, CodeScene will present them to you.
CodeScene will rank them to you.
You click on it and you can immediately see a list on the major issues that we have found
together with our recommendations.
But we could make that even
better. So that's one of the things I'm actually working on right now.
Very cool. And now this is, I mean, I'm looking at the website right now. There's a lot of cool
things here. I see the social patterns you talk about. Obviously, developers can write better code faster together, find the hotspots,
the hidden risks.
That's also interesting, right?
I think risk is something that is just as important because if you're changing, you
can predict the risk, obviously, of a code change, right?
Yes, that's correct.
That's also one of the main use cases I see for these kind of techniques in continuous
integration, right?
So what tends to happen in many organizations, particularly large organizations, is that
code reviews tend to become bottlenecks.
So the idea is that if we can predict the risk of each change set, then you can use
that to prioritize code reviews as well.
Right. So you have a high risk change. You probably want to put two people on
verifying and inspecting that. And then you can have lower risks that you can roll through a
little bit more quickly so that you know that you use your time wisely in the code review process.
I didn't even think about that. That's even, that's awesome. Yeah, because you're right. As you scale your development organization, and if you're enforcing code reviews on everything, and if you don't have any differentiation between what's the priority or what's the risk with a code change, and you always assign the same number of people, you won't be able to scale because you're spending too much time in, let's say, quote unquote, less important code reviews versus more important ones.
And that's, I need to write this down.
This is really cool.
Yeah.
And I also think that there's a limit to how much code we can review each day, right?
So after a while, we will start to slip and there might be some serious issues that just
go through because we're not paying attention.
We are too tired.
This is really cool.
Andy's taking notes.
No, I'm really taking notes.
Yeah, I know.
I'm wondering when the blog will come out. so this is this is Adam and this is one of the great side effects
of our podcast
is
we get to talking
and it gives Andy
a bunch of great ideas
to go
you know
off and run
and play with
and explore
and it's always
great to hear
when Andy
gets sparked
or inspired
but whatever I do
with it
obviously I give you
all the credit
I'm not going to be
somebody that
just takes an idea from somebody and sells it as my own, but no, this is really cool.
Um, Adam, I have, um, so I want to jump to, you made an interesting comment in the very beginning
when you explained, I asked you for three things. You said technical debt,
you said a separate technical side from the people side.
And then the last thing you said was improvements only tie them to the business value.
And obviously, I believe this is what resonates a lot with decision makers, right, in the end.
So, you know, what do we really improve. Can you give us a quick idea on how can you quantify the business value of a change
just by looking at the data that you have right now? Is there any, or is there additional data
we need? Is there additional input people need to tie an improvement or a change to the business
value? Yes. So I, my experience, because I'm often in that situation that I'm potentially the most
frequent codes in user in the world, right?
I use it every single day.
I do services around it stuff, right?
So I'm often in that situation where I have to explain something deeply technical, code
quality issues, technical depth, and I have to explain it to non-technical managers.
So one thing I've found that works pretty well is to show the hotspot visualizations that we do,
because those are almost like a map of your system, and that kind of shows the importance of your different parts of the code. And if you combine that with the trends for each hotspot,
so you can see the evolution of properties like complexity within the hotspot, number of developers have to work on it,
then it's usually a pretty easy sell to non-technical managers.
They tend to realize that, all right, we really need to act upon this, right? And
to me, that has been one of the most important use cases because now you can suddenly tie together our deeply technical, usually inaccessible word of code with the business people.
But I do think that to really have something that you can measure in time and money, I think you have to go for things like cycle times and lead times.
So you're most likely aware of Accelerate that came out last year, right?
Yeah.
So I think that book has been, I think it's one of the most important books over the past years.
Because they're actually showing this causation between the different cycle times and the business values like increased profitability, customer satisfaction, all that good stuff that business people want
to have.
Right.
So if you could set the baseline that these are our cycle times, let's say this is what
the time it takes from a customer detects and reports a defect
until we have it fixed in production, right?
That would be a really good measure.
And see, can we cut that in half by paying off the technical debt
in these critical parts of the application?
Yeah, so it's in the end.
So the business value in this case is really about efficiency,
time to market, time to remediate
we call it i think i have two two metrics that i just used in some of the work that i've done
is mtti and mttr so mtti means the mean time to innovate so how long does it take us to innovate
versus spending time on doing as you said you know bug fixing and working on technical debt,
so MTTI and then also MTTR, meantime to remediate.
These are exactly two of the metrics that are very easily understood
by the business.
And if you can then show with your analysis how to give developers
more time to innovate, that's one great thing,
and also how to improve code quality
in order for developers to react to problems faster.
Then this is, yeah, it makes a lot of sense.
Okay, very cool.
Very nice.
And just for reference, the book Accelerate you mentioned,
I'm assuming that is the science of lean software
and DevOps by Nicole, I hope i get this name right for for
forsgren uh jess humble and gene kim is that the uh one you're referencing yes that's correct
right so for anyone listening that has been officially dubbed now it's the most important
book yeah we had we had gene kim on the podcast. How was it? Two years ago?
It was a while ago.
Two years ago.
And he was talking about the DevOps Handbook.
That was pretty cool.
And yeah, Nicole, she's with Dora, right?
She's been doing all the DevOps,
state of the DevOps report analysis,
and she's been driving that.
And yeah, the findings are fascinating,
what they came come up with or
came up with and definitely helps us to better communicate to the business how important it is
to invest in quality in automation in working on technical debt yeah and what what I really admire
about their work is that they actually managed to put some real science on software. And I think this is something that we do have a research community in software.
We do know things about software.
We do have data to show it, but there tends to be a huge gap between research
and practice and accelerate, I think, serves as at least a bridge for
some aspects of software that turn out to be really important.
Very cool.
Adam, is there anything else that a listener should know?
Or did we cover at least the basics of what people need to know when it comes to leveraging and harvesting the data that we have
lying around, but not harvest correctly in order to figure out where are our hotspots,
you know, using it for risk analysis, as you said. Is there anything else listeners should know
before we leave them go? And then hopefully they explore code scene your blogs your books
yes so i think there's that's one really important thing that i really like to add and that is that
all that kind of data we have been talking about all that kind of tooling and visualizations
they never stand on their own so they need something as a compliment that that compliment is
you dear listener.
So these techniques are there to kind of help you focus your attention and expertise to the parts of the code that are likely to need it the most.
But the decisions and actions are always going to be yours.
So I think that's important to point out.
That's true.
Let's say, what's it called?
It's an action call, a call to action.
Call to action, yeah.
Exactly.
So speaking of calls to action, should we call the Summaryator to action?
I think we should.
Absolutely, yeah.
Let's do it.
All right. Well, I've been fascinated when I listened to you in Romania, where you kind of opened up my eyes on what we can do with data that I didn't even know all of us have. The rich history of the Git commit history, data that we have in our change management system, in our version control, in our ticketing system. So what I learned today, and hopefully everybody learned today,
is that detecting hotspots is important,
but like what you said earlier,
just detecting a hotspot purely alone on, let's say, technical complexity
is not enough.
You need to give it context.
So I really liked what you did with the change frequency,
so figuring out obviously what changes a lot
but then correlating this with how much time is spent
on actually for developers to do things
are these bug fixes, is this new development
so give it more context to data
because just technical complexity
based on static code analysis tool
doesn't really tell you whether it is a smart move
to change that code if it really doesn't have
a whole lot of impact in the end.
I also really liked the stuff when
you said the behavior of your team members.
So if somebody changes something here,
then most likely you're changing something in another part of the code.
So kind of seeing team behaviors, using this also for recommendations in the future.
We have a feature requirement here for you, putting it into the IDE.
And then I think the biggest thing in the end, though, is you will probably, unless you have a lot of energy and you can do this as a grassroots movement from a technical side, you should most likely first go to the business and show them what can be improved in terms of becoming a better organization, delivering better quality, reducing cycle time, improving efficiency.
So selling to the business first is a big thing.
And if people want to try out the stuff that you've been writing about,
the stuff that you just heard, go to empire.com,
check out CodeScene as one of the products out there.
And then if everyone ever has a chance to see you live on stage,
I can just encourage
everyone it's a it's a was a really precious pleasure seeing you there and learned a lot of
stuff thank you so much thanks a lot for your kind words and uh thanks a lot for having me here i
really appreciate it thanks hey andy for once i had a couple of thoughts too. I want to throw out there. Of course.
No, sometimes I do.
First of all, everyone needs to realize that you're surrounded by information or at least data, right?
And some of it you may be aware of that's out there.
Some of it you may just look at as noise.
But I think it's always worth looking at what you have access to because what Adam and team and all did is, you know, there's this wealth of information sitting in GitHub and sitting in other tools where if you take a look at it and really start thinking, what can you, can you use that and spending the time to see what you have available and how you might be able to apply it,
you might be sitting on a goldmine, not just financially by saying, hey, we're going to start a company, but also for a goldmine of information to help you improve your processes.
And if you then use that information to improve your processes,
you can free up more time to take a look at what's available to do this. So it's a cycle that feeds into itself. So don't ignore all the different various data points that you have access
to. Obviously, you don't want to use them unnecessarily, but see if you can bring value
to them. The other thing I wanted to just kind of bring up
was kind of more like excitement for the near future,
hopefully near future, I hope.
When we take a look at tools like CodeScene, Dynatrace,
I want to also reference back to Akimis and Probit.
So Akimis is the one, Andy, if you recall,
where they're going to use some AI engines to tweak, like say your JVM settings to get performance improvements.
Probit is the company that's taking ROM data and generating Selenium scripts and possibly in the future, maybe some load testing type scripts. What I'm really excited to see is all tools now are being developed with this accessibility in mind, this API accessibility, this ability to get data out so that others can ingest it.
And just thinking in hopefully the near future, taking all these data points, as we mentioned earlier, the idea of marrying some of the Dynatrace data with the code scene data.
But also when an issue occurs, maybe you plug that into Akam,
you know, you have the Akamas context as well
to see, okay, there's an issue,
code performance was poor,
but is that because of the code
or is it because of a setting on the JVM
or the container, whatever might be in there,
the container, the OS,
and just the marriage of all these tools, right?
They're all separate companies,
but there's no thing that prevents,
because of the openness of it all,
there's nothing that prevents everyone
from using the data in a cooperative way.
And I'm really excited to see
how this might really play out,
not just in theory, but in practice in the future.
And it's because of all these wonderful tools
people are building that we might see some of this.
Mm-hmm. And maybe as a last thought for adam while obviously it would be great to have an integration
with dynatrace you should look into the work around open telemetry which is the open source
so the open standard around tracing and monitoring with Anatoly is also part of.
But if you are really intending on building integrations in the future, then you may definitely want to look into open telemetry.
And I know, Brian, I did the recording with Sonia and with Danielle on open telemetry and also talking about open tracing trace context so the recording is
out there and i can i think it just makes a lot of sense to also broaden kind of your
scope of what data you can ingest by you know adhering to these standards and looking at what
open telemetry can give you. Hey Adam, last thing.
Do you do social media?
Do you have anything else you want to promote?
Do you do any appearances
coming up in the fall that
you're aware of that you might want to share?
Yeah, sure. So
you will find me on Twitter as
Adam Thornhill.
I'm blogging a lot
at Empire.com and I also have my personal blog at AdamTornhill. I'm blogging a lot at empire.com
and I also have my personal blog
at adamthornhill.com.
And yes, I'm speaking at several conferences
this fall.
I will be keynoting Eurodev
in Malmö, Sweden.
I will also go to Copenhagen
and a bunch of other conferences.
So I hope to see some of you around there.
Awesome.
Well, thank you very much for being on the show, Adam.
We look forward to seeing what else you come out with and we'll be sure to be following you.
Thanks.
Thank you.
Bye-bye.
Yeah, thanks a lot for having me here. hair.