PurePerformance - 071 Lessons learned when breaking a Monolithic Healthcare System with Brett Hofer

Starting point is 00:00:00 It's time for Pure Performance. Get your stopwatches ready. It's time for Pure Performance with Andy Grabner and Brian Wilson. Hello, everybody, and welcome to another episode of Pure Performance. You know, in the last episode, everybody, I might have given everybody a scare by announcing that Andy might have been abducted by Der Kamasar. And hopefully somebody caught the reference and tweeted in about that. But we record these ahead of time. But I actually flew over to Austria, went to Falco's house, and there I found Andy. So Andy is back. Hello, Andy.

Starting point is 00:00:54 How are you? Hello. I need to listen to this episode. Obviously, I haven't heard it yet because it's not a year yet. So the Commissar, was that a hit too in the US? I believe so. Yeah, there was another band that did an English version of it. But then you had some of the cool radio stations playing the Falco version where you couldn't understand a word of it. But he had a really groovy video, 1980s, cocaine, all that stuff.

Starting point is 00:01:22 He's a legend in Austria still. Unfortunately, he's no longer with us, obviously. Great stuff. He's a, he's a legend in Austria still. Unfortunately, he's no longer with us, obviously, but, uh, yeah, great stuff.

Starting point is 00:01:28 Great, great. No, but I'm, but I'm back. Yes. Welcome back, Andy.

Starting point is 00:01:33 We have a good show today, right? As always. I hope so. Yeah. I think we can already hear him a little bit in the background because he tried to find a quiet room and a headset. Hopefully, hopefully,

Starting point is 00:01:42 if I had a real quiet room, uh, today is brett and i think brett was with us once or twice already brett are you uh once are you the second or third time for you on the show uh i think this is the second time yeah i think it's second as well we've done many webinars so it gets blurry that's true hey hey bre, for those folks that don't know you, a quick intro maybe what you do right now. And then we obviously, you know, focus on stuff that, you know, I learned from you a couple of weeks ago when you were actually visiting Austria and we were at the end of the week sitting at my balcony and we're talking about all sorts of things. And then I learned a lot about

Starting point is 00:02:22 your previous professional life, which is why we brought you on the show today. But maybe let's get started with who you are and what you do right now. And then we will dive into the topic. Sure. So as I said, Brett Hofer, and I'm actually a managing practice manager for Dynatrace Services, specifically targeting enterprise conversions. I used to do primarily just DevOps engagements for working with our enterprise clients on how to build those kind of optimized pipelines. And it's really blossomed now more into a team of world architects where we're going to be primarily focused on that plus cloud native and architectural guidance

Starting point is 00:03:08 for our larger enterprise customers. That's cool. And how long have you been with Dynatrace now? So I will have been with Dynatrace going on six years this November. That's amazing. Yeah, congratulations. Yeah, that's cool.

Starting point is 00:03:23 Yeah. Hey, and when we sat down here in Austria, you were actually over for a workshop we did on some of the, let's say, use cases and services we are offering to our customers, mainly the larger enterprise customers and without going into further details we we sat down at the end of the week as i said earlier and then you started to explain uh you know what you did in your previous life because i was interested in even though we've known each other for a while i guess we never really talked about a whole lot about our past yeah and and then i real then you told me and i guess you can just tell it much better so you want to tell me what you did before Dynatrace and what time of projects you started and then you like the one big project that we want to more focus on, but let's get started. What did you do before Dynatrace? Yeah. So before Dynatrace, and I will say, let's, if we're focusing in on the job that we were talking about prior to me joining, I was actually a, and it was really around this whole

Starting point is 00:04:26 monolithic breaking down of the monolith, which has really, you know, made me think about this whole entire story because this was the granddaddy of monoliths. I was the senior application manager for all of WellPoint's customer service desktops nationwide. And that was really the project where I inherited, you know, almost three years worth of coding that had been done and had turned into a really tough business challenge for the folks there. And I was asked to step in and help fix this. And I quickly found out how monolithic this application was, and so the story started to get written. And that's really where I had to do a lot of the things that you talk about, and then we went into obviously a lot more detail on some of the things I did to fix that.

Starting point is 00:05:31 And now the, I mean, first of all, I'm not sure if everyone that listens knows WellPoint. Maybe you can give a little more context on what WellPoint has been doing or is doing now. Yep. So WellPoint was actually a conglomerate of 14 Blue Cross Blue Shields. It's basically the largest health insurer in the United States, which is now Anthem. And we were, at least at the time when I was there, we were insuring 35 million members. And the application was specifically for all of the customer service desktops nationwide to be rolled out to 8,000 of these desktops nationwide. So we had nurse care managers. We had three different divisions like commercial, personal lines, and federal lines, where we basically had all these people who serviced people calling in, asking for claims. Do you cover this? People from the hospital, behavioral health. I mean,

Starting point is 00:06:21 you name it, this piece of software had to be what was sitting in front of the desktop for these folks while they were taking calls live. So performance was huge. Access to things were huge. Lots of changes constantly going on. But WellPoint is still – or which is now Anthem – is certainly a very well-known health insurer. And then you said, so there was already a project underway for about two to three years. And what was the goal of that? And how did it change? And why did it get to a point where they had to call for help?

Starting point is 00:07:00 Yeah, so what happened was, actually, the funny part was the three years was the call for help in the very beginning. And they had a very large firm come in and help them build this application. And I think it was almost as though this firm did not realize the size of the magnitude of the challenge. And it was all done in Java, J2EE, they use Rational. And because the way health insurance works, you have all these different back end lines, you know, you have your memberships, you have your products, you have your plans that they sell to these different companies. You have the claim systems, you have hooks into Rx or pharmacy. And then of course, you have every different type of customer from the corporate to the federal

Starting point is 00:07:52 employment, the federal employees to just regular personal managers or personal people who just have those. And when you have such a huge challenge and all these requirements, they were essentially trying to start from scratch and build the ultimate, you know, servicing application for, uh, for the company. And, uh, but I don't think it ever made it to the architects in understanding how big this thing would get. So over two to three years, you know, they didn't really create modules out of this. It was just built one big war file that just kept getting bigger and bigger.

Starting point is 00:08:31 And, you know, they would have their modules for each one of those different types of components that I explained. And so in this service company, so they came in, built everything from scratch. But what about the existing system back then? So nothing was reused and never thought about,

Starting point is 00:08:48 and like slowly building something new on top and then reusing some of the existing services through APIs or any background that you are still memorizing on how that works? Yeah, no, because actually it was handled by numerous different applications by the service. So this was a completely new thought of just creating one unified desktop. Because actually it was handled by numerous different applications by the service. So this was a completely new thought of just creating one unified desktop that managed everything and then could track calls from the beginning to the end. It had different types of telecom hooks so that when a call came in, it would answer and give them notifications and be proactive. So it was definitely a kind of a bleeding-edge business thought which sparked this innovation. And plus, remember, as I mentioned, it was a conglomerate.

Starting point is 00:09:34 So WellPoint had just finished acquiring 14 different Blue Cross Blue Shields. So which one were they going to pick? They needed something to unify. Yeah, that makes sense. Can I, can I just butt in here for one second? Because I'm listening to this and I'm a little bit fascinated and I, I heard it in the beginning, but as I'm hearing the story, I realized I didn't retain it. You started with Dynatrace six years ago. So this was what, like eight years ago you were doing this? Yes. So this was, well point four uh five okay so just putting that

Starting point is 00:10:06 in perspective when we're talking about breaking up monolith that's the reason i wanted to bring this up is this is well before that big trend started occurring so that was just kind of impressed me there and i wanted to reconfirm that point there yeah and actually they thought they were breaking up the monolith but back then it was about the service. Everything was SOA architectures, right? So everything was, well, we would break it up by federating, like funneling everybody through a big, huge enterprise service bus. And then the back ends in their mind, the access to the back ends was the breaking of it up. But they didn't realize there was so much consumption on the front end that the front end has to mirror all that data. So they didn't realize there was so much consumption on the front end that

Starting point is 00:10:45 the front end has to mirror all that data. So they didn't think about breaking down all those into separate components from a front end perspective. So after two and a half years when they then called you, and I think we should later on also maybe go a little back why they called you, because obviously you already had a good reputation within the company for a different project you did there. But when they called you after two and a half years, what was the status? Why, why did they think, you know, they need help from somebody else?

Starting point is 00:11:17 Yeah. So, well, so I had, as you mentioned, I had prior to being put into that position, I had been charged with converting all of the customer services or the telecom, the 800 numbers. These were 800 numbers that actually called into the desktops that were used. Believe it or not, they had over 15,000 of them. And I was asked over a two and a half year period or two year period to convert from one carrier to the other carrier to get one unified carrier for all their 800 numbers. So I had a very big familiarity with the landscape, the call patterns, you know, all of the different divisions I had. You know, I had a lot of political connections and networking connections throughout that. So because I knew that landscape

Starting point is 00:12:06 so well, and I knew applications really well, it's what kind of put me into the spot to assist on this. And that's when I, you know, sat down in a meeting and when it, when it happened, I went through some very, very aggressive steps to finding out where we were, how many people we had on, how was it structured, and how were we going to fix it. So what was the status quo that you were presented with, and how did you – I mean what were the things you found out is primarily this, what created this as a monolith is the fact that because for over two and a half years, they had been continuously building on this war file would continue to get larger. All of the screens for accessing, let's say claims or the membership information or the plan information, all these different things were just being piled further and further into the system. And, you

Starting point is 00:13:05 know, the interesting thing about enterprise projects that are this big is as an app manager, you're really faced with the project request, as you can imagine. Remember, I had mentioned that there's really three major lines. There's commercial lines, there's personal lines, and then there was federal lines. And each one of those are represented by different stakeholders from everything else. And then the funding would come in. And now as the app manager, you are given a set of resources that have to all manage this and get what needs to be done, collected, tallied up and then delivered within a quarter. And you had to make the quarterly releases because everything else in the enterprise was being released on that quarter. So you had to go in and you had to finish. So these were some of the big things. And so as this thing grew, build times became very important. The amount of automation and testing became very important.

Starting point is 00:14:20 But also because of all that pressure, the developers only concentrated primarily on the next quarter of their release. Meanwhile, your technical debt is just piling up. And so we were finding things like, you know, 100,000 exceptions or errors every five minutes were accumulating in the logs. And, you know, one issue would just kept boiling up, and then that would go into the next release and into the next release. And pretty soon the only thing they're concentrating on was the new functionality and everything else. The issues list just kept building. So I kind of walked into a big, big nest of issues. And you stayed.

Starting point is 00:15:00 I stayed, yeah. Just doing my job yeah and so so to reiterate if i hear this correctly uh that means because of the quarterly pressure of delivering based on the budget that was giving and based on the promise on the features that should be delivered within the quarter obviously the engineering teams focused on delivering the new features probably then cutting on writing automated tests, cutting on doing probably performance tests, cutting down on, as you said, analyzing the log files and getting rid of exceptions and stuff like that. And then over quarters and quarters, and then over the course of two and a half years,

Starting point is 00:15:42 obviously there's a lot of stuff that piles up just because at the end of the quarter somebody wants to deliver and has to deliver a promised feature. And so, yeah. Well, maybe a little bit more to that, even. It's not so much the development team at that point was like, let's say, the automated testing. See, we had many different departments representing, and we had silos, and we had, you know, throw this over the wall. So, what would happen is, let's say, for example, automated testing. It's not that the developers were bogged down in the automated testing. The problem was trying to keep the communication going between a monolithic app and its changes and telling the automated testers what they were going to have to automate because the code was changing so rapidly and they'd have to wait for it to be deployed

Starting point is 00:16:23 into an environment before they could actually run their automated tests. So just manage it. I mean, they were, you know, these are budgets of, you know, 10 million a year, 20 million a year. I mean, very big budget. So you could throw as many as an app manager at the time, you could throw as many bodies as you wanted, you know, at it. They were just like, just get it done. But you can't manage that. You know, the size of it was just getting out of control because it was one big monolithic app and, you know, three hour build times, uh, everything. It was tough. Were they taking shortcuts on some of the testing and all too, or is that all at least getting done? Well, you know, when I got there, you know, this is all, what, what did I witness once I came on the ground? Uh, this is what I was faced with when I, when I stepped into the role. Um, and it was basically, we need you to help fix it. Um, so that's,'s, but as Andy was asking,

Starting point is 00:17:25 that's really what I ran into. That's the role I stepped in. In fact, I had a staff of almost 70 people onshore, offshore at the time I stepped in the role. Yeah. Before, you know, telling us what you did, why did you take that job? Because I had won three awards on on the last job i did and the vp at the

Starting point is 00:17:48 time uh and i'll put a shout out to denver wall brown he knows me uh he said i know you like challenges so okay that was my vp at the time yeah there you go. So then what did you do? Okay. So when I first got on, what I ended up doing is he said, this is my VP. He said, you got your development staff there in Virginia, Richmond. I said, well, it looks like I'm going to spend a lot of time in Richmond. So that was the primary group. I was in Connecticut at the time. So I flew down to Richmond, Virginia, and I gathered the entire room of all the lead engineers for the various components,

Starting point is 00:18:38 and I put them in a room. And I think I shocked them a little bit because originally my predecessor had not necessarily nearly as much technical knowledge. I have a very deep coding background and I brought them all into a room and I sat them down and I said, okay, we're going to do code reviews. And I don't think that they thought that was going to happen. So rule number one, when you get into somebody to fix it, like a senior app manager, product manager, make sure they're extremely technical, especially if you're trying to fix a problem, because that's the way we say you take the smoke out of the cockpit. And I literally went around the room with the lead problems that our people were facing and that were being reported in the issues list. And I tackled them one at a time, having them show me the components, how they were tied together, the dependencies, how they architected it, how the screens were being done. And I literally brainstormed. I had flown people also in from around the country to that session for that one

Starting point is 00:19:36 month to build an ultimate strategy on how we were going to fix this. And once I had established that, the next thing you do is I decided to go on a national tour and visit some of our largest call centers where these customer service reps were taking phone calls. And I made my team, including the business analysts and the development staff, the folks that were doing some of the design work, sit down and listen to the agents all day and watch them move through the screens. Watch them get frustrated. Watch them on the various things that they were running into. Take complete notes and then draft that back. And then we would formulate from a usability perspective and a priority perspective, what are the number one things we could fix for our users

Starting point is 00:20:32 so we could slow down the pace of, of, you know, abrasion is what they called it. Um, call abrasion. Cause I mean, at the time, you know, well point in the entire teams, both development and IT and business were really, really important of making sure that the customers were happy. So basically letting this, this is obviously something that we now would expect anyway to happen in a software project that you actually understand your end users, that you have somebody that represents an end user in your team that actually can speak for, hey, this is what we need, this is what doesn't work right now, but you actually flew them into the different locations, let them sit next to them

Starting point is 00:21:13 and experience firsthand on what their frustration is. And now, did they, in this particular time, or at this time, you already rolled out the new system that was built for the last two and a half years? Or was it still the previous system that was supposed to be replaced? Oh, no, this was the new one. I mean, it had been rolled out in strategic areas.

Starting point is 00:21:36 But before they could do further adoption, of course. Yeah, there's nothing more emotionally impactful than, you know, instead of sitting and seeing a metric go off on the screen saying, hey, you've got some frustrated users. Although that's super helpful going forward because you can't obviously visit people all the time. But when you're trying to fix a system, it's much more impactful to watch it firsthand, especially if you're the one supposed to be designing the system. Yeah, exactly. That's what I have to say. I mean, obviously, I love the fact that you can sit physically next to somebody. It's not always possible, but that's why I think advances in our RAM technology with the session replay that we're building right now, I think that's going to be obviously a big help here

Starting point is 00:22:27 because it shows you how people are getting stressed with moving the mouse around. Or I think as Simon always says, rage clicking and rage scrolling. And obviously that's going to help a lot with the feedback. Oh, absolutely. I mean, and we were doing that type of stuff, but the difference why I think our stuff is going to be so incredibly powerful is the fact that it has context. I mean, one of the toughest part was establishing context of the call with the data. So we'll be able to actually tie the problems to the session, but that was a huge piece to it i'm curious um when you're going around having all these meetings

Starting point is 00:23:06 pulling everybody together you mentioned that you had these quarterly releases and it was kind of full throttle to that release how did you manage to take time enough away from people to get this done like how was that handled right because obviously uh in many cases you'll say, all right, we're going to slow down some of the releases. In modern terms of monolith to microservice, a lot of times protections are built in and teams are given the leeway. But it doesn't sound from the setup that you gave that there was spare time to do that. Or if you did do some discovery, how are you even going to start implementing something? So how was that time situation handled? Yeah, so there's really two pieces to it.

Starting point is 00:23:48 There's the actual fixing the problem, and then there was the assessing the problem. The assessing the problem was probably about a two-week tour. I was only using my lead engineers. They weren't necessarily doing the coding every day. They were doing more of the leadership pieces, brought some BAs and some business sponsors. Because obviously, you know, it's also, you know, the software development between IT and business is a very big relationship thing, even internally with an enterprise. And so, you know, we formed some bonds. So it was also long days.

Starting point is 00:24:24 I mean, we were still on calls trying to handle qa calls and things that were going on the next release but the team members that i put together were primarily not the coders it was more the lead cool so and the uh the that means you you stabilize you started to obviously you know put out a lot of fires where fires had to be put out. And did you then manage to take the system and then fix it well enough to roll it out to the rest of the different departments and locations? Or did you then choose a different approach? So the approach became – there was two coordinated pieces. There was number one, we had to, so the enterprise service bus, believe it or not, was because it was still in its growing stages, had not been fully versioned, which made the quarterly releases very difficult.

Starting point is 00:25:19 Because, you know, once, we weren't the only consumers of that enterprise service bus. So there was fixes going on with other teams that were outside of my control. But from our perspective, we did some very, very strategic things. Number one, after doing all those assessments, we've identified how to break the monolith, meaning we were going to break it down into separate service components that could all be built independently, which required quite a bit of object-based re-architecture and dependency changes, dynamic loading of dependencies. Instead of having to build this one big war file, we could build and break everything out by their individual modules from the membership modules to the claims modules to the product and plan modules. All that stuff could be built in their own separate pipelines. Now, certainly, as you noticed or as you mentioned, you can't be doing that at the same time as you're trying to have the same team do the changes that must be done for the next quarter. And so I had to stomp for, number one, a full-blown design upfront that we sold to upper management and asked for a certain amount of money and started an entirely parallel team that all that team was responsible

Starting point is 00:26:47 for was the completely new branch and new deployment methods and new everything that would be running in parallel. And they would have to work with merges and take in new changes that were happening in the quarter. But ultimately I had to create a completely independent parallel team that, that worked off of in parallel with, uh, with the mainstream team. How, and how big was the team? How much grant, how much money did you get? How much funding did you get? And how big was the team? So it was, it was a $4.3 million ask for the fix, um, for the fixed path, because, because instead of going off of the master group branch, we would separate that into separate build branches and separate pipelines. But I believe – the funny thing is even though I had a 70-person team, I really only constructed – I think we had about eight to ten guys that were the fixed team or basically the new architecture team. Um, and you know,

Starting point is 00:27:49 it was almost like we said, build a hello world for a completely new type of pipeline. I mean, we use things like Maven, we broke things into Maven and we had a lot of things that strategies that that team used that was independent of what was being done in the main thing. But they had the – they were empowered to make whatever changes that they needed to make happen. And that was really the only way I was out there and all the code changes that went into the quarterly release. And they were responsible for taking these code changes and also porting them over to the new architecture. Is this what I hear? Yep.

Starting point is 00:28:43 That's what they did. I am. And the way they did that was, you know, they could, they could leverage resources that were being used. You know, a lot of the classes they could still, you know, a lot of the class they could still reuse. It was more of a decoupling, um, logic that they, and, and the way they restructured some of the objects, uh, the abstracts and the macros and things like that, they created an engine that would allow us to do things like break those things down rather than statically have static dependencies and things like that. So it was not obviously, and I guess I got this wrong in the beginning. I thought it was more like a copy paste exercise, but that's not what it is. Because it was obviously not smart we're going to start a new project by copying and pasting it all and adding some stack overflow

Starting point is 00:29:29 copy and paste as well exactly yeah the other thing that was really important too is uh and they actually i we had them do it on the main line rather than just the the separated line was what we considered calling quieting a system um so you know although we would go and i see this a lot in in companies and enterprises i go to as me especially in the larger applications where they're just spewing out tons of exceptions and thousands and thousands of things and people don't realize that especially as you get later on and you want ai and you want to baseline things um it's very difficult to isolate issues in a pipeline when the pipeline spewing hundreds of thousands of exceptions all the time and so i made right out of the gate the first thing even even just the normal development team, I asked them to quiet the system, to go down from 100,000 exceptions and log errors down to like, you know, we're talking 90 to quiet so that when something does happen in

Starting point is 00:30:46 the pipeline somebody would introduce a new change or a new problem and check in a new issue it sticks out like a sore thumb but you will be amazed at how many systems out there are what i would consider noisy and people don't realize how important it is to quiet those systems yeah because otherwise you will never really find the anomaly because if the whole system basically is behaving abnormal then you never really find yeah that's more common than than you think believe it or not i think the record i was seeing in an old uh client we i used to work with i went in on a proof of concept and they were running about 14 000 exceptions a minute and i said, here's a problem. They're like, well, we know about all those. I'm like, this can't be good for the system.

Starting point is 00:31:32 And, and, and Brett, how did you, what was your approach of quieting? That means really people had to look into these exceptions and figure out why they are thrown and then fix the root cause or is quieting the system or that they you know you shall not you shall not lock those that are um you know not essential to the log files and therefore they don't make it into you don't make the noise or yeah it was a combination of things well some of it was just rudimentary. I mean, it was basically, okay, sort count on the number. Like if I, if you take all your exceptions list and you sort count it and you said, okay, I can knock off 30,000 of those right out of the gate because they all can lead back to a specific path and stop ignoring it, hit it. Then boom, you just knocked out 30,000 of them. Logs, very similar way. Either it, you know, I've seen many times where people would log something and they would just flag it as an error, but it wasn't really even an error, but it had rippling effects by marking it

Starting point is 00:32:38 an error. So yeah, turn it into info, turn it into warn and let us dial it down. And then, of course, the other noises were things like and I would I would tally all this stuff up as technical debt. Right. So, I mean, the more logs, exceptions, slow DBs, HTTP errors, you know, in your web requests, those things tally up what equates to technical debt. And when you add it all up, if your numbers are that high, then you've got a lot of technical debt and people as an app manager, you should put that on the forefront of reduction, like even as high as some of your stuff with your, um, you know, as your features or functions that you're delivering. And the reason for that is because many times buried in those, in that technical debt or cascading issues that are feeding, you know, QA issues and problems out in the system and manifesting themselves in different ways.

Starting point is 00:33:38 Um, and I found that exercise to be incredible because as we're trying to build a new line that works better, I can't be seeing all of this other noise going on. And it really helps me identify when something else is introduced that's a problem. Yeah, I think we should come up with something like a health indicator, because basically what you're explaining is if you are operating on an unhealthy app, then everything you do, and if the, let's say the immune system of this app is not good, and then you add something on top of it, you don't even know if what you put on top can actually be handled by the underlying system, because it's something like a, like a health indicator. And it seems as a couple of metrics that you just brought up, right. I mean,

Starting point is 00:34:29 where approaches are quiet in the system, the number of logs, the number of exceptions. I mean, these are all indicators of how healthy an application is or a system is. Yeah. Yes, absolutely.

Starting point is 00:34:39 Yep. And then you'll get them also from the UI too. So it's, it's, it's not just app server level. If you get J script errors or crashes, you can build a list that defines that. And I think what I'm saying is the moral of that story is people need to stop ignoring it. Like if you're told as an app manager your system's got all kinds of issues, treat those as just as important as the functions that are going out for the next release.

Starting point is 00:35:09 Yeah. Cool. So quiet in the system. What else? Anything else you did afterwards? I think the other thing that we did a lot was tracking traceability. So one of the most important things about, uh, a system is somebody comes up with an idea and they want to, and, and, you know, agile is great and, and tracking, uh, and making sure that that change, uh, happens, but, uh, people don't realize how

Starting point is 00:35:37 important that traceability is who, who entered this idea, where, at what stages has it gone through? If it runs into problem, how do I get that problem back to that person? So the whole traceability route was a huge piece to it as well. We worked on a lot of things about when a function went in and it was requested by a particular project team. They, we had values on it and I'm not talking about just SLA metrics, but if we had a change going in, that change dollar amount could be like $350,000. And $350,000 change, we should have more eyes on it, more tests on it, more things that we follow and we feed back not only to the developers but to the business as to how effective that change is as it goes through the pipeline so we worked a lot on the whole traceability piece of entering in documenting what the user story was and then going uh and

Starting point is 00:36:39 going through and having accountability uh for it So that was another very big thing. Yeah, and I think this is also where I remember when we were sitting on the balcony, we talked about tracing back also the business success or how does this feature or whatever the requirement was, how does it actually behave? How is it accepted? How much does it cost? I think we talked about feeding these metrics, for instance, back from a tool like Dynatrace into your JIRA or whatever you use for your requirements management. And then actually seeing what's the adoption of that feature.

Starting point is 00:37:17 How long did it take until it was adopted? And that's kind of closing the feedback loop back to the business and also to the developers, obviously. Huge. Yeah. That's why I was so excited. I told you I was like, I was so excited when you showed that slide. Like when I always tell you, I look at your podcast and you and I, we converse even on some of the material we both use. And then I was like, oh, I was so excited when I saw that.

Starting point is 00:37:43 More so from my past. I was like, oh, I was so excited when I saw that, more so from my past. I was like, where was that in my past? So with all that, I assume the whole project was a success in the end? Yeah, so, well, I mean, the story continued onward because it was at that point in time – and I won't mention the names of the company that was sold. So just as this was happening, WellPoint had sold into a massive account. And one of the requirements for this application was to go 80% video interactions with the nurse care agents. And so that presented a whole battery of new challenges. Do you record?

Starting point is 00:38:37 Do nurse care agents go on a video screen? Some didn't want to be video. So they create new positions, new logistics on the cubicles. We had to figure out how are we going to get video in on the customer service desktop so that they could converse, you know, back at that time. And it was right at that time where I was, I was part of a team that was engineering all the solutions around that coming up with the dollar amounts trying to procure. And, you know, I think at that point in time, I was like, I think I'm,

Starting point is 00:39:14 I'm a little bit burned out from all the challenges here. And, uh, it just so happened Dinah Trace was right around that time, uh, asking if I was interested and they found me on LinkedIn. Uh, and so the story I would have to leave to the person who followed me as to where everything ended up. But I did win an award. It was very successful, the breaking down of the monolith. We had new build paths. I was thanked by numerous. There was a reorg with even the new reorg vice presidents and stuff. So, yes, in my – from my perspective and when I left it, it was in a very successful state.

Starting point is 00:39:54 Did you – can you tell us a little bit about – you mentioned when you took over the project, the build times of the big monolith was about two to three hours. What was the metric in your broken system, well not broken, but broken up system into the individual pieces? Do you remember the average build time? Yeah, I think we actually got them down to 35 minutes because essentially we had just the primary framework had to get built. And then all of the individual modules like claims membership and those pieces that were broken out could actually be built in parallel. I think overall CPU processing time was very similar.

Starting point is 00:40:38 But because we were able to create those pipelines to run in parallel and those build modules to run in parallel, we were able to compress that down to about 35 minutes. Cool. And did you stick with Java or did your teams pull in any new technologies? Yeah, we did definitely stuck with Java. We had to because really we were just abstracting pieces to do the decoupling and we had to continue to leverage the code that was still being built like the screen resources and things like that we were still leveraging a lot

Starting point is 00:41:10 of that uh a lot of that code we weren't like rebuilding screens we were actually the engineering part marvels were in how to decouple it while still maintaining the primary code base, like the business functionality and stuff like that. Which is also a great testimonial, because I think there's some certain stigmas out there. You know, the big monolithic apps come from the Java people or the.NET people. But obviously, if you're architecturing it right with any technology stack, you can either build a bad monolith or you can build something that actually is you know is architecturally more sound and and scales and

Starting point is 00:41:50 you can break it into individual components and then test and build it independently so it's not the technology stack that immediately gets you on the path of towards a monolith or not but it's it's really what you do with the technology i I think that should – I mean it should be common sense, but it's great to hear it here again. Well, and it's also – I think the key was I had some really, really great people on that team, very, very sharp, smart Java people on that team. And to consolidate them into their own and empower them to do it is, is what made that even more successful. They, they weren't in the everyday, you know, battles, uh, with, with change. They were chartered with, you know, Hey, engineer something that will basically accomplish the following. And, you know, and that's, that was their life. And that's what they needed to concentrate on was the

Starting point is 00:42:42 decoupling piece. Yeah, and I think if you think about it that way, too, a lot of organizations struggle with finding the time to make these massive projects. But if you take a stepped approach like you're talking about where, okay, we'll keep our language the same, we'll just set up some decoupling, that then will make this easier to maintain, which would then possibly free up some resources to start working on phase two, which might be, okay, maybe we can do this more efficiently with a different language or a different framework, but taking a stepped approach instead of an all-in-one, because I can imagine if you tried to do a whole new framework, a whole new language, all that at the same time, it'd be much more difficult. Yep. And those discussions did happen. I mean, there were numerous people stopping to change out specific technologies. But we had the money, the budget that we had, but we also had all these other challenges like bringing down the technical debt, making sure not need to make the quarterly releases. We weren't releasing the broken down piece until the final stages where we felt comfortable it would operate as a broken down architecture. So they weren't on the same delivery schedule. But yes, we did talk about that.

Starting point is 00:44:01 And I was always trying to make sure we weren't biting, we were already biting off enough. Hey, and I got one last question here. So breaking the monolith or finding these breakpoints or seams or whatever you want to call them, how did you go about it? Was this just domain knowledge of the engineers that told you where they think it makes most sense to break the big monolith into pieces? Did you use some tooling? Did you do some trial and error? How did that work? Well, a lot of it came down to proper use of like Maven builds

Starting point is 00:44:39 and dependencies. And then from a re-architecture perspective, instead of trying to statically link or bring things into these Java libraries where if you bring one in and it brings a whole bunch of other things in, we found breakpoints where we could do load library, where we were basically dynamically loading, especially on resources and screens and things like that, rather than baking and making sure all those resources got into this one big war file, breaking all those resources out, putting them into their own builds. And then as soon as we did that, we'd see cross dependencies occur. And then they would solve those cross dependencies, like how would we break that cross dependency? So it really went, you design the broken model, you break the monolith. You design your desired state. And then you go back into the code base and you find out all the places where the other ones. Um, and, and then the build techniques were different because instead of having this big, massive build where you had to think of all the dependencies and download them, getting a lot smarter with Maven, Maven profiles and things like that, we, we, we did. Cool. Wow. And now, you know, that many years later, well, thanks for, thanks for making the move over to our team, to Dynadroid.

Starting point is 00:46:07 It's been six years and you, as you mentioned in the beginning, you're now leading the global practice team. on our side with that experience, with walking through this particular process of taking an enterprise software stack that obviously had big issues and couldn't deliver to the business needs, breaking it apart, making it, putting it on pipelines, you know, automating it and having somebody, I think, on the global practice team, like what you are leading here, you know, hopefully gives a lot of our customers confidence that it's not only a great product that we sell, but also we have a lot of people that actually know how to advise our customers on how to use our product in their process and how we can obviously help them, you know know getting them to where they need to be and

Starting point is 00:47:05 i think that's that's pretty awesome and uh i'm really glad that we could chat uh a couple weeks back on the balcony because i otherwise i guess i'm not sure when i would have found out what he does yeah i think i think we were a few glasses of wine deep too at the time so I would say a few bottles. Okay. Maybe you're there. You're right. I lost track. No, and I love this position as well. I mean, I've, I've, I love the company. I love the position because I get to actually get exposed not only to now one company, but I get the, I get the, you know, the opportunity to see this at the largest scales for different ways on people and how they approach it. So it's very, very good for me as well.

Starting point is 00:47:50 And I appreciate all the opportunity I get here. Perfect. Cool. Brian. Yes. Yes. I think you kind of summarized already. Come on.

Starting point is 00:48:01 Do it. Yeah, no, I think so too. I mean, but just quickly uh brad as you probably know we always do a little summary in the end i think just a very high level what i learned or what i would what i would like people to take away uh from this talk is if you are taking over a software project that is obviously walking and running in the wrong direction today. A couple of things that we learned today. First of all, quieting a system. You have to quiet a system in terms of the logs and the exceptions

Starting point is 00:48:32 because only if you can quiet the system down, you actually get rid of all the noise and you can actually see any regressions that individual new changes bring. I also like the fact that you sat down in the beginning, did code reviews with individual teams. You obviously came in with a lot of technical skills that your predecessor didn't have as much. And therefore, you also showed that, you know, a different wind is blowing now, and we have to really take this seriously. I also like the fact that you took parts of your team,

Starting point is 00:49:03 flew them around the country, sat them next to the end users, and really learned firsthand on where they're struggling. That obviously is the best way to get close to feedback, look to the engineering team on what to do next with the product and how to make it better. And then the last thing is, obviously you don't want to just rip and replace, but what you really did is building a um

Starting point is 00:49:26 a parallel team that was focusing on breaking the monolith and decoupling reusing obviously the same code uh because you don't want to rebuild everything and recode it the whole functionality but having it running side by side and then then at some point in time, make the decision of when it's ready for prime time. And then, you know, obviously replacing the old system. And yeah, I mean, it's, it obviously also helped that you had a great reputation. That's why people trusted you. And I think that's it. That's the summary from my side. Very good, Andy. The two things that I learned, and I think these are important ones, is number one, when Brett was talking about the budgets, I now understand a little bit why beyond the regular cost, but why health insurance is so much.

Starting point is 00:50:18 And number two, if you're ever placing bets on Brett for a challenge, if somebody bets that he will not eat a mouthful of wasabi, I will bet that he will. But if they bet that he will not eat a mouthful of wasabi with a chaser of razor blades and lemon juice, I will vote against that because he knows when it gets too crazy. But Brett, you like the challenges. So if someone ever gets into one of those,

Starting point is 00:50:43 I dare you to eat that things, I think I have a better idea which side to bet on. Yeah. You would win. But thank you so much for sharing the story. It was awesome. Any final thoughts that you had about all this? Anything you wanted to make sure people took away from this?

Starting point is 00:51:00 No, I think you guys did an awesome, awesome summary. I think my one addition is how important it is to – and I see this a lot – to put a very, very good leader in with a big, strong vision and willing to lead that vision in charge of these size projects and then making sure you empower and trust them. Because, you know, although I had those, uh, the, the position I did, I give a lot of credit to the directors and the VPs that I worked with, the business folks that I worked with that empowered me to make those changes. Um, and, and just went with what I had asked. Um um and so that is so important from a top-down perspective to to breaking a monolith and building solid pipelines i second that we need leaders not managers perfect all right well thank you for being on brett andy thank you for getting this one set up and welcome back hopefully uh you weren't too too put out by being locked up before.

Starting point is 00:52:12 If anybody has any questions or comments, you can contact us at Pure underscore DT on Twitter, or you can send an old-fashioned email at pureperformance at dynatrace.com. I'd love to hear any ideas. If you want to be a guest of the show and you have things that you want to talk about, let us know, and maybe we can get you on. Anything else from anybody, or is it time to say adieu? I think it's time to say goodnight on my side. Yes. Thank you guys for having me on.

Starting point is 00:52:33 Thank you. Thank you. Bye-bye.

PurePerformance - 071 Lessons learned when breaking a Monolithic Healthcare System with Brett Hofer

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.