PurePerformance - 016 Transforming 6 Months Release Cycles to 1hr Code Deploys
Episode Date: October 25, 2016Guest Star: Anita Engleder - DevOps Manager at DynatraceAs a follow up to our podcast with Bernd Greifender, CTO and Found of Dynatrace, who talked about his 2012 mission statement to the engineering ...team: “We go from 6 months 2 weeks release cycles” we now have Anita Engleder, DevOps Lead at Dynatrace on the mic.Anita has been part of that transformation team and in the first episode talks about what happened from 2012 until 2016 where the engineering team is now deploying a feature release every other week, makes 170 production deployment changes per day and can push a code change into production within an hour if necessary. She will give us insights in the processes, the tools but more importantly about the change that happened with the organization, the people and the culture. She will also tell us what she and her “DevOps” team actually contribute to the rest of the organization. Are they just another new silo? Or are they an enabler for engineering to push code faster through their pipeline?
Transcript
Discussion (0)
It's time for Pure Performance.
Get your stopwatches ready.
It's time for Pure Performance with Andy Grabner and Brian Wilson. Oh, hey, hey, everybody.
It's Brian.
It's 8.45 in the morning for me.
Sorry, I'm still waking up.
So I'm not my energetic, usual self today.
And it's 8.45 today because our guest today is over in Linz, Austria. We'll get
to her in a moment. But yes, we're making special accommodations to get everyone in
a decent time zone, although I guess I could have done it earlier. Anyhow, Andy, how are
you doing today?
I'm good. Actually, what I should say now is, ¿Cuál es tu problema?
That's one of the most frequently used sentences I had used last week in Chile and in Peru.
Like, what's your problem?
Come on.
You're sleeping at 8.45.
Hey, what's your problem?
What's your problem?
Exactly.
Well, you know, life is good here.
My time zone, whatever it is.
I think I'm two hours ahead of you.
10.45.
I'm well awake in my third cup of coffee and i'm really excited about as you said our special guest today from
our linds austrian lab are we ready to introduce her or do you have any other interesting sounds
to give from yeah i'm sure i have interesting sounds but they're not near the microphone
so brian in the last episode we were i think was really cool to have burned uh guy finita
on the call and he was basically explaining the transformation that dynatrace
as an organization but especially we as a development team went through, because he said a couple of years ago to kind of escape the innovation dilemma,
as we call it, we said we need to basically reinvent not only ourselves,
but also the APM market.
And what he said out there, he said,
hey, team, I want to go from two releases per year to deploying a change
into a new SaaS-based software every other week,
but with the option to deploy a change into production every hour.
So basically going from, I think he also said from agile development that we already did
even before kind of everybody knew that this was called agile and continuous integration.
So we already did some of these best practices before.
But he wanted to extend that and say we need to do continuous delivery.
And one thing that he was very particular on is continuous delivery with feedback loops.
And now with much further ado, I now want to introduce and say hi, Anita.
Welcome to the show because you are actually one of the key people that he mentioned,
besides, obviously, a lot of people in the engineering teams at Dynatrace
that actually made that transformation possible.
And so without further ado, hi, Anita, how are you?
Hi, Andy, hi, Brian.
It's really cool to be part of the show today.
I'm sitting here in Linz in Europe, so it's about 5 p.m.,
so I'm at my fifth cup of coffee.
So I hope that I do not fall asleep here,
but you will help me and Brian, I guess.
Yeah, so you talked already about what burned goals were.
I think it's two years or two and a half years ago.
As we said, okay, we have to release every second week
and the change has to be in production within an hour.
So, and two years later, we're actually able to do this.
We do this, it's our daily business.
And it was kind of challenge, kind of somehow a big transformation as well.
But we are really proud what we did.
And, of course, there are always things you can do better.
But we are very happy with the processes we have at the moment.
Yeah, and I just wanted to put this in context kind of to a few episodes back.
We had our guest Adam Auerbach from Capital One who was talking about
they were really more of a monolithic change, I believe,
where much closer to waterfall trying to get a huge change in place.
And what I'm looking forward to hearing about this one,
because as you mentioned,
there was already the Agile and CI components there.
And what I imagine goes on in different companies
is everyone's like, hey, we're Agile, we're CI,
and everyone's patting themselves on the back, right?
And then someone like Byrne comes along and says,
hey, we're going to release every hour.
And no matter how awesome you think you are, you have that agile CI part implemented already, it's still a huge, huge undertaking full of challenges but with great rewards.
So that's kind of what I'm looking forward to really hearing about today.
Now, Anita, in the preparation to this call and obviously in conversations we had over the months and years because we
obviously both used to work in the same office and we exchanged thoughts.
Now, in the preparation, we talked about one thing that I thought was very interesting
and will be interesting for our listeners is how you actually kind of organize your
teams because I know there's a lot of people actually working on it.
I think Bern said we had like a hundred engineers almost from
the beginning organized in different teams to work on this new way of developing and deploying
software. And I also think that your team in particular, because your official role I know
is DevOps manager, but you actually see yourself maybe in a kind of a different role, especially
your team. So what I would be interested in, A, what is your team doing and providing that actually makes us all of a success?
And how did we also change from an organizational perspective?
What is a team now within Dynatrace that is working on this rapid deployment model?
That would be very interesting.
So can I start with B and then add A?
Yeah, sure.
Or do you like to have A first?
No, no, perfect.
Okay.
No, no.
Yeah.
So let's first talk about the typical development team.
So as Brian mentioned, we started with HL development
and we were already good in continuous integration.
So it means our mindset two years ago was already in HL1.
And the teams were really used already to sync somehow end to end.
And we had also very small teams.
So this kind of two pizza teams, we never had a lot of people in one team.
So from the team size and also from the mindset,
it is not really a very, very big change.
But what we are trying at the moment is always to form small teams
that have an ownership end-to-end.
It means these end-to-end feature teams.
It means typically that the team also owns the UI,
owns the middleware, and owns the database.
And it goes actually further because for these parts,
they not only own the implementation,
they also have to take care of the test coverage.
They have to take care together with their product management, how they bring this feature to production, so what's the go-to
production plan, how they want to stage it, and they also are responsible to think about and maybe
also to implement the feedback loop, so how can they find out if the feature they are trying to build is working
as expected? How do they find out how our users are using this feature and so on? So it's a big
ownership. And this last two parts I mentioned, the feedback loop and how we bring it to production,
this was not part of the team ownership two years ago. This is actually the part we added here.
So this is actually the biggest change in our team organization.
And this is what we mean when we say, okay, we have end-to-end feature teams.
I talk here about ownership.
Ownership means that they have to drive it.
They have to think about it.
They have to involve people if they need some help.
It does not mean that all of them, they need to implement in their team.
Otherwise, it would be huge teams and they would not be effective as well.
So it means sometimes it's really part of the team that they implement these parts.
Mostly it's really the main feature, also the UI and also the test coverage is built by the team that they implement these parts. Mostly it's really the main feature, also the UI
and also the test coverage is built by the team. But sometimes it's just only involve another team
that is helping here. Sometimes it means involving a team that is taking care of the deployment
automation or also is taking care of the feedback automation. So it means they act here somehow as a product manager.
So mindset here from a dev perspective is you really like your product manager.
They define stories like a product manager is doing for other teams.
This is really a big change for us.
So now, before you go on, you answered the question perfectly,
but I have some follow-up questions now.
So I think the most interesting thing is what you said.
Your development teams are now fully responsible end-to-end, and they need to think about how to not only deploy it into production, but also how to monitor it in production.
So that's one big piece.
And obviously, we are a monitoring company, so we hopefully know how to monitor things.
And did we actually use our own products for that as well?
Just an FYI or just a quick question.
Yes, sure.
So we always try to eat our own dog food or drink our own champagne.
So this is always our mindset.
And of course we use our own product in order to improve it as well.
You know, I just got to say I think drink our own champagne is a much better way of saying that.
I don't know who thought of the dog food one.
Yeah, our product is more a champagne than a dog food, right?
But who would want to eat their own dog food?
I mean that's just – yeah, thank you for that.
I'm going to keep that one in mind.
But I think dog food these days is actually pretty delicious too.
I don't know.
At least there seems to be a lot of high-scale products out there.
All right.
Let's go back to DevOps and continuous delivery instead of dog food.
So the other thing that I was interested in, so you said a responsible full stack from UI to implementation to database.
How does this work?
Do you have a UI specialist now in every team?
Isn't that amazing?
Isn't that really hard that you have the skills,
like all the skill sets in every team?
Yeah, in many teams, we really have UI skill set,
but we have also a full web UI team.
This web UI team typically prepares somehow these dashlets or a framework
that the other teams can use.
For example, how a button look like, how a typical line chart is looking at.
So they can reuse that.
So they're creating like a repository of all the different web elements
that all the other teams are able to pull in. And of course, the teams can ask the WebUI team
if they need some help.
And so the WebUI team is also kind of like a feature team, right?
It's like you are the future feature team
that are responsible for the WebUI framework
and kind of we are your customers
and we demand some more feedback.
That's awesome.
Yeah, that's perfect.
And are these teams,
was there a resistance in the beginning
if they are responsible end-to-end
for what happens if something fails into production?
Who is on call?
Was that a big problem,
a big discussion point in the beginning as well?
Not really from the development part of you more from the management part of you because we had no
full understanding how this should work in a in an emergency mode on weekend and so on
when a team is covering this all and bringing so fast changes to production, who is taking care of adjusting whatever the monitoring.
If you change a feature, you maybe have to adjust the monitoring.
You have to adjust the alerting.
You maybe have to adjust the runbooks.
In an emergency case, this was more the troubles.
From the development perspective, it was not so a fast change because what we tried first is really to work as we
were used to work.
So have a special web UI team, have a team working on the database.
And Andy, you know this, but Brian, I'm not quite sure.
We are separated in four locations.
So we have a huge development team here in Linz as well in Gdansk.
And we have one in Detroit and one in Walsham.
And we try to work cross labs with this team.
And for example, the web UI team is in Gdansk.
And it was sometimes really hard to talk and discuss web UI things across
labs. So it was rather natural to bring some parts from them to the teams and the web UI team,
for example, tried to build a framework so that the team can use this in a better way.
So it was more a natural process. We first tried it in the way we were used to and then find out, okay, we need some improvements.
Also for this ownership that you have to take care to bring your code to production.
At the beginning, they actually did it as they always did it.
And sometimes it was not working properly in production.
And they started, of course, to involve themselves in trying to make it better.
Because whenever something goes wrong in production, we're doing a follow-up meeting with the
whole devs.
And they can bring in ideas in order to improve it the next time.
And typically, this ended up in, OK, I have to think about how I bring my feature to production.
Do I need some migration
do i need some some opt opt-in toggles or do i need a feature toggle or whatever it means
yeah it's pretty cool natural process yeah and i think what's really amazing is i mean first of all
uh you have a lot of power now as a development team, as a feature team. But with that obviously comes responsibility.
But what you give them is actually the chance to grow from just doing development but really think about end-to-end what it means to operate stuff and also what it means to build the right thing in order to make the company successful in the end.
So I think this is just like giving the responsibility and the ownership to the team
is just leveling up the whole team. Because as you said, if they fail,
they first of all learn from their failures. But I think in the end, they will start building
better quality products from the start because they obviously want to avoid these war room
scenarios. And it seems even though it might be a rough path in the beginning this is the right
way to develop software to give the ownership the responsibility but also the power to the teams
and they can run on their own it seems it's fantastic to me yeah and i think it's an
interesting juxtaposition too again going back to the the capital one story where
in that situation the challenge for the actual team members was you had developers,
you had operations, you had testers, right?
And they all were siloed in their own components.
And you now had to get developers thinking about testing and operations.
And you had to get testers thinking about development and, you know,
having reusable scripts that run.
You had to get all teams kind of thinking in parts of the other teams.
And that was a huge, huge challenge.
I mean, for all the employees on the teams,
what Adam was talking about was everyone had to adjust
and everyone had to adopt and adapt or move on, basically.
And it's kind of going from that's a a really much deeper level transition
and because you all were had already started in cicd you still had a transition for that whole
ownership and the deployment kind of piece of it but that was a much higher level transition
they already had that experience through it through it it. So it, it, although I'm sure there was a lot of challenges, it sounds like it's something
that is not as much of a, an initial hurdle because that, that kind of piece has already,
you know, some of those changes have already been done. And that just kind of, I don't know,
the thought that goes in my head about that is, is wondering if, you know, if somebody is
transforming from, you know, waterfall monolith and trying to get to CD?
Do they try to make a first pit stop to get to Agile CI and then the second one,
or just try to tackle it all in one, like Capital One did?
It's just an interesting question for, I guess, any organization to maybe pose.
And I don't think there's an answer to it, but...
Yeah, that's a really good question.
I have to think about that.
We typically try to get all in one.
But Andy, I think half a year ago,
you were showing me such a slide with a continuous delivery pipeline.
Do you know this?
This red pipeline where everything is failing?
Yeah.
And then another one, a green one, where you have a lot of monitoring
and at the end there's a super quality product.
Exactly, yeah.
And you said, okay, this red one you should not do
and the green one is how you should do it.
And actually I told you, do you know this red one was our first trial?
So this is how we started.
We tried to do everything at once.
But the biggest difference is here that the end result was not our production environment.
So we made our, we call it here, dev environment to the end of the pipeline.
Means what we did is simply building up an automation pipeline
with some monitoring we already had in place,
building the product every day from the trunk and deploy it in our dev stage.
And we made our whole team somehow responsible for this dev stage.
And they also have to use this dev stage to demo
their features so when the staff stage is down and not working they cannot demo their feature
after sprint and then it's not ready so we there's no outcome no new features are coming out product
management cannot show new features so the whole team has to stop and focus on the quality of the pipeline. And this made us actually very fast because this brought the right focus.
So not one team was focusing just on his features and try to get a good feature out.
And the other team was completely failing.
So this brought us really together as a team, also a cross lab team, because you have to
think about there's a component developed in Detroit and Walsham and some components in Lintz and Gdansk.
And every day we do a big bang deployment of this recently changed components from trunk.
And when it's failing, everybody has to stop and work on this on this dev stage.
And this made us actually from coming from this red pipeline where everything is failing to a green one in the dev stage. And this made us actually from coming from this red pipeline where everything
is failing to a green one in the dev stage. And once we were good in that, we used the very same
pipeline to bring the code to production. So this is actually our story and this made us successful.
Yeah, I remember the days when we had sprint reviews and then it was like, hey, what's the system you're demoing this on right now?
Well, it's my local machine.
No, that's not the way it should be.
It should be the Dynasprint.
We call it the internal system.
So for the people out there, we have the Dynasprint environment where we deploy sprints.
We have the Dynaday environment, as you said, where we deploy daily sprints.
We have our production environment.
So I remember that.
That's really good. So we forced ourselves to use our day-to-day builds for doing all of our demos for showing that the features actually worked.
And it was painful in the beginning, but we all leveled up and produced better software.
That's cool.
Hey, going back to question number one now.
So we know kind of what the teams look like.
Now your team, and I know your team is an integral part of the whole thing and actually make that kind of orchestrates everything and not only orchestrates different teams, but I think you are also responsible as a product team on its own for a very core component of our software, which is the orchestration layer.
So can you tell us a little bit more about your team,
what you're doing and how you operate?
Yeah, I try to give you some insights.
So actually we have many responsibilities.
On the one hand, my team, especially my person,
is responsible that production is up and running.
So at the end of the day I'm responsible for that but we play this a bit different as
maybe other other teams are doing that that have this label DevOps so we do
this job by giving the development the ones of responsibility so actually they
are responsible to monitor their own features, the health of the
features, and to adjust here in case of emergency. So in case of weekend or night when something goes
down to be automatically alerted. But the goal is always to keep this number of alert to the level
of zero. Whenever such an alert is happening.
Actually, we have a follow-up meeting with the responsible development team,
also with the chief software architect in order to avoid this in future.
This could be, of course, a bug fix, but could be also an architectural change
or a better failover or something like this.
So this is actually the way how we do operations.
So we have not a 24-7 operations team looking on dashboard, looking at measures.
And that's why we sometimes are burned is sometimes saying, OK, we are doing no ops,
not DevOps.
We are already doing no ops.
So this is one big part, really to help the dev team to level up and improve the quality of the features, the quality of the whole orchestration layer,
means the deployment automation, bringing the codes to production,
and also for the feedback.
So monitoring the product that is in production,
monitoring the failover in case it's happening automatically and so on.
So we have a huge team in actually in Gdansk building this component
we call orchestration layer and doing all this stuff for us. Yeah, and what I told before,
therefore I said, Andy, can I answer question B first, is that our dev teams are also responsible for thinking about how my code comes to production and how my code should fail over in case of an emergency.
So it means that the dev teams are also somehow product manager for this orchestration layer and defining stories.
So typically, 90% of the stories this team is getting is not from me.
It's really from devs.
I'm just here also to prioritize the stories and to coordinate this.
And this is working really, really great at the moment.
This is one part, bringing the product and the source code to production.
The other part is also helping actually the devs to have everything in place to get the feedback they need from all the stages.
This means not only for production stage, also for the stages before and to be able to consume then this feedback.
This could be sometimes that we need to think about how we want to improve our own monitoring solutions or how we want to improve Dynatrace in order to bring this feedback.
But this could also mean that we have to think about something special, maybe some special handling. For example, on the agent side, we built a special framework
so that the agent can send up some feedback if he thinks it's necessary
from its own health or something like this.
So this whole feedback tooling is actually an important point.
And a really big point on my end is that the devs really have everything to
view the health of their features in all their in all the stages so not only on all the production
clusters also on all the clusters we have in in the stagings before and to be able to consume
consume this in five minutes per day
so that they know, okay, everything is fine.
I can focus on the new features.
Or, okay, it's not working as expected.
I have to focus on that.
I have to investigate that and leave the features where they are,
the new features, and focus on robustness and quality and so on.
So somehow a self-controlling or a self uh how do you say this um somehow a
self-healing and a self-controlling system so that nobody needs to create tickets for the
for development if something is not working as expected they do it by by their own. And maybe to say it in some figures, two years ago, as we were in this situation where
we released twice a year, we typically get tickets so that we have something found in production that
is not working as expected by support tickets. So a customer is telling us, okay, this is not
working as expected, we have to fix that.
So nearly 100% is triggered from outside.
Somebody was creating these tickets for our development.
At the moment, we have 95%, a bit more than 95% of the tickets,
of the actually bug tickets or investigation tickets that are found in production,
more than 95% are created from development itself.
So they find it by themselves,
things that are not working as expected by the feedback tooling.
95% the rest is from external, from customers.
So they ideally find things that are not working as expected
before a customer is affected.
So that's really great.
Wow. And that's basically, I mean, I know this is the second part of our talk later on.
We'll talk about what they actually do to get these feedback loops implemented.
But that's phenomenal.
It's a phenomenal change because that means all of a sudden you transformed from a reactive to a proactive optimization mode and it's just phenomenal
wow that's actually the the right wording to make really the devs proactive looking on their own
code looking on their own features in all the stages and taking care that it's working as
expected and not only working in staging or load test. It should work in production and the users.
Those are the real user feedback.
And to know how many users are using my own feature and how they are using that,
it's really important to have this information directly in development's hands
because they can then focus on the right thing.
They improve the feature in the right area where users are using that and where users have value out of that.
So that's really great.
Now to kind of conclude this first part of our conversation, I just want to reiterate.
So basically what you are doing, you as your team, you are kind of – and if I bring a correlation or an analogy to the automobile industry where you have a conveyor belt and you basically are pushing stuff on a conveyor belt with different pipelines until in the end, hopefully, a car comes out.
So what you're actually doing as your team, you are providing and automating that conveyor belt. You listen to the individual teams that actually push stuff through the pipeline, how to optimize that pipeline to make their work more efficient and easier so that they can really focus on developing new features.
So that's part of your team.
You're responsible for this pipeline.
You're providing the tools.
But you're acting as your own kind of product team.
As you said, you're getting requirements in mainly from the teams that are using this deployment pipeline.
And you're also helping them to put in monitoring so that they can actually get into this continuous
feedback loop.
And I think that's phenomenal.
And you have a lot of teams out there that are working on different features, very small teams.
The two-pizza team kind of mentality is within our organization.
We have teams that are working distributed on four different locations across the world in different time zones.
And it seems over the course of the last two to three years since Ber burned set up the goal of deploying a change into
production within an hour you achieved that gold and it was a step-by-step approach first forcing
ourselves to use our own software in our own you know internal dev stages to actually feel the pain
when something is not working and with that increasing the quality and especially by
giving the individual teams more power more responsibility but also the ownership so kind
of does this kind of sum it up it's really a good summary thank you andy it's just like no
i think it's phenomenal you know it's it's amazing what you guys did. Yeah, Brian, anything from your side?
I've been talking too long.
I think, Andy, you would make a great proofreader and editor for like, yeah, anyone has to write a paper or report, especially like if I think about if my seven-year-old daughter was right. And I don't mean, Anita, that what you were talking about
is in any way, shape, or related to the way my seven-year-old daughter
would write something in any way at all.
Don't mean it that way.
But you just can take everything and summarize almost anything,
I think you can summarize.
So yeah, I think you should look into that side of things, Andy.
Big change.
No, Anita, I thought that was amazing stuff.
I don't know if it's something maybe we want to briefly touch on in the next episode of this,
or maybe we could take a second.
But, you know, when, and it's been mentioned several times,
when Byrne said we're going to deliver every hour,
the one thing that struck me was very rightly so.
He said everyone kind of looked at him as in, are you crazy?
And I just kind of wanted to get your input on that.
Maybe we can kick off the next episode with that before we dive into the pieces, because that to me is always a fun thing where you hear, you know, we all sit in meetings from time to time and hear people talking. And, you know, at least in my
previous job, we would hear all these kind of lofty goals of what product management wanted to
do. And I would start tuning out because, you know, I knew in the culture we had, like, yeah,
that's never going to happen. But obviously when you, when you get that kind of edict
that you're going to go to CD and it's going to be every hour, you know, there's no avoiding that
one. So I just kind of wanted to get your thoughts of what you and the team might have thought when you heard that.
But we'll come back to that one.
Let's do that.
Yeah, it's amazing what you all did there and looking forward to the second half.
So we are going to – did you have any other thoughts before we wrap up today's – I mean the first half of this conversation, Anita?
No, I think Andy the first half of this, uh, conversation, Anita. No,
I think Andy did really a good summary.
So,
uh,
I think we,
we can stop here and go ahead with the second episode.
It's a good end.
And also what you mentioned is a good start.
Okay.
So everyone will be back in a moment.
Um,
thank you all for listening.
And just,
uh,
just for,
uh,
sake of continuation, uh, you know, this is Pure Performance.
And you can contact us at pureperformance at dynatrace.com or Twitter us at hashtag pureperformance at dynatrace.
Any feedback, we need continuous feedback as well. So we would love to have that from you all.
And yeah, thanks for listening.
We'll be back in just a moment.