PurePerformance - 053 Feature Toggles: Implementing, Testing and Monitoring them!
Episode Date: January 15, 2018Feature Toggles or Feature Flags are not new – but they are a hot topic as they allow safer and fearless continuous delivery. Finn Lorbeer ( https://twitter.com/finnlorbeer ) gives us technical insi...ght into how he has been implementing feature toggles in projects he was involved over the last years. We learn why he loves https://github.com/heartysoft/togglez, how to test feature toggles, monitor the impact of features being toggled and how to make sure you don’t end up in a toggle mess.
Transcript
Discussion (0)
It's time for Pure Performance.
Get your stopwatches ready.
It's time for Pure Performance with Andy Grabner and Brian Wilson.
Hello, everybody, and we're back with part two of our conversation with Finn Lorbeer, which, no, does not count as another episode for him to be in the three-timer club yet.
But he's getting close.
Andy, we wanted to continue our conversation on another topic.
Why don't you introduce that topic topic and let's dive into it.
Yeah. So, Finn, in your previous talk, you talked about feature flags or feature toggles.
And I know I hear this more and more, and we use it internally at Dynatrace as well.
But I wanted to get your thoughts on how do you start with feature toggle? And is it possible
on an existing code base and then start feature toggling?
Or do you think it is easier and better to think about feature toggles when you build something completely new?
And the other question is a couple of questions that I have.
So kind of what are your thoughts on how to get started?
And the other thing is how do you manage feature toggles?
Because I assume the longer you build and work on a project the
more feature toggles you will have so eventually will you how do you consolidate will feature
toggles at some point go away because they become that you don't need to toggle them anymore because
they're just part of the of the default set and how do you monitor features on versus feature off
so i know there's a lot of questions um but maybe let's get started
with you have to remember all of them you have to remember all of them and if you don't we're
not going to have you back exactly maybe we'll get sidetracked along the way as well so let's
yeah i think it's easier to start on an existing project to be honest if you start on a green field
you if you really start fresh,
you have no users in the first place anyway,
and there's no need to toggle any features.
You just want to get your very first version
of the product out.
So unless you have users,
it's not worth it to work with feature toggles.
And if you really build a small application
where every story adds a huge part of functionality,
I think it's also very difficult to toggle because any story will touch
a really big part of the application.
Then it's a really big task to just build in the feature toggle.
If you have a big, I don't know, legacy application that is four years old,
thousands and thousands of lines of code code and you just know your little corner where you work on and there inside this
little corner you build a small additional functionality then i think it's much easier
to toggle just in the environment where you're in if you want to to just make sure that your changes
that your change or whatever you write
is that code in production for now.
That's the idea, right?
And once you ship your feature,
so the way we do it,
or the way we always try to do it,
while you develop it,
the feature toggle is on on your local machine, obviously,
on some test environment,
and it's switched off in production
so that users don't see half-baked features where styling is missing or some elements in the front end is missing.
Anything that looks weird, we don't want to show them yet.
This is all we want to achieve here.
So as soon as we're done with a story or with a couple of stories
that are together a feature that we want to toggle on, we toggle it on,
we observe it, and the day later we should take the toggle out usually
unless there
is some reason to keep it in so this is how we try to reduce the amount of toggles the story is not
really done or the the conglomerate of stories is not really done if the feature toggle is still in
there okay and how do you so technically are you typically seeing that you write your own feature
toggle framework so do you use frameworks that you write your own feature toggle framework?
So do you use frameworks that are out there?
And what frameworks can you recommend or have you worked with?
The one we work most with is the toggles with a Z in the end, T-O-G-G-L-E-Z, a library written in Java.
We also embedded it into our Clojure application for now.
It has a really nice front end.
And there comes another beauty of feature targets.
You can go to your product owner in the end and say,
the feature is finished.
Do you want to deliver it to your users?
Do you want to hit the button that releases?
And they are super glad.
It's like they have tears in their eyes for joy.
And this is what this toggles framework gives you for free.
We didn't
manage to get this enclosure um but it's it's a very nice start i've also seen um people building
it on their own or on the other hand if you have a small project if you just have one or two things
that you want to toggle don't don't start with a framework just it's a simple a toggle is nothing
but an if statement um if you want in the beginning.
And you can also in the very beginning,
what I always want to do is to decouple deployment from feature toggles. So I want to deploy at some point and at some other random point,
I want to flip feature toggle.
But in the beginning, when you have just one or two things
that you may want to show or not,
just make it an if statement, global variable that is on or off and and then you
have a feature toggle already you don't need anything fancy for it it's more the concept
and thinking of how you deliver your software using this toggle that is that is the challenging
part and only when you have five teams over two continents that work on the same product, then you need to find a way that the pieces of work
that you identified and the areas or code bases or parts
that the teams or pairs touch are decoupled enough
so that the feature targets don't influence each other.
Because if they influence each other,
testing becomes a nightmare
because you can't test all permutations of all targets
that you have in your code base
then. So then that brings me to two more questions. The first thing is how do you test
for correct toggling? So that means if you say essentially a toggle is like an if statement
around the code block, how can you make sure that the developers don't forget to put the if statement around the code block that is
part of that feature especially if a feature is distributed across different classes
and throughout the code base is there any best practice anything messy chocolate for once
yeah in fact so what works really well really well i've tried it three times in three
different teams and it worked each time is to give chocolate for well-implemented feature
toggles at the end of the story just in the stand-up and just occasionally right if you're
in the stand-up you're talking and the story was done and you have a feature toggle and yeah it
worked well and everything was nice you just say say, oh, I want a chocolate.
And then what do we do?
And everybody else is looking.
So it's really mean, but it works.
So that's obviously one part.
It's such a kindergarten approach.
It's amazing.
Yeah, yeah, yeah.
But keep it simple.
And the other thing is for backend toggles, it's more difficult.
I don't have the strategy yet.
For frontend, it's much easier because if you don't toggle your frontend properly, you will have changes in your frontend in any deploy.
So what you can do is to make a pixel comparison of your product before you ship it.
What I always want to do and what we will implement here now that we just went live, that we just start with this proper feature toggling,
the next step for us here as well will be to have visual regression testing,
but automatically so that we don't need the tester to look.
We take a snapshot of our product, of the website, let's say of the start page,
immediately before we release because we know there is no feature yet
there is no feature toggle yet this is how the website should look like without anything changing
then we deploy the first commit which is feature toggled hopefully so afterwards the website should
still look the same we take a snapshot afterwards and we compare the two of them they should be the
same we trash them there's no problem in archiving,
no problem if Firefox versions upgrade.
In the meantime, it's just take a snapshot,
wait five minutes for the deploy or less, take another snapshot.
If you have some things that sneak out,
you will have front-end changes
and then you detect them.
If everything moves down by 10 pixel,
you will find it immediately
what a manual tester would never see.
That's the strategy for front-end.
Yeah.
And so then you said, you know, once a feature is, you know,
you turn on the feature, you let it run, and then eventually,
what you said, feature toggle should be removed once we know this thing is stable.
That means you just then do, as part of a deployment, you just remove the if statements around the toggle.
That's basically it.
It just becomes part of the next deployment.
And then the if statement that returns true anyway, because the toggle was on, will just be removed and the code runs just as before.
Exactly. anyway because the toggle was on will just be removed and the code runs just as before exactly and in the snapshot comparison it also looks the same because it switched on before it still switched on afterwards just with the with the code parts that are in that anyway by now
removed and this should also match and this works then really well if you if you start to confuse
feature toggles for ab testing frameworks or if you start confusing feature toggles as self-healing systems, then you will end up with a mess of feature toggles.
So you have to be a bit strict or disciplined about when you use what and if you want to introduce a new framework to achieve something else but feature toggling for development.
And how do you monitor what's the best way to any any approaches to monitor or well let me ask you this question first because then monitoring becomes easier to answer uh do you normally
turn on the feature for everyone or just for a portion where it then becomes more like ab testing
meaning do you turn it on for every user or that's the canary release you're talking about
right just some users see how it behaves um until now i have to say we didn't need that yet i haven't
been on a project i know that you can do it i know the charm of it um we didn't need it yet because
the features were stable enough that we shipped of it and we didn't need it yet because the features were
stable enough that we shipped so far we we didn't have performance problems um or if you have
performance problems you usually don't see them with the first three users at the moment we have
20 so we switch it on for them anyway um i i didn't need it simply so far it's it's a nice
concept but if you don't need it you don't have
to to introduce it right away for the monitoring part um if you if you flip a feature toggle so
every commit every deployment is fully automated tested before you release a feature toggle that's
a totally new functionality and you probably want to test this manually exploratory and very serious with a product owner and maybe some devs even.
And me as a QA, then I'm super, super curious how the entire software behaves and how I can break it.
That's still part of my job, just not my only job.
And then the feature toggle is toggling a new feature on or new functionality. It's not an everyday event, right?
That's maybe every week if you're really fast or even less.
So that's something special.
And then you monitor manually.
You look on your dashboards.
You look on your database requests.
You look on your response times.
You look on your error rate and everything.
And if you see this rising up, then you toggle it off again.
And if you see everything is fine, you leave the toggle.
If it's still fine the next day, it's probably good enough to be removed.
Well, I mean, you're right.
You probably want to look at the – especially user behavior, right?
But if you expose real users to a feature that you have tested well enough and you think it's perfect, your real user may still think differently about it.
So that's why i'm watching these metrics
makes a lot of sense i what we try to do now and i know you said you're watching performance and
failure rate and then toggle it back manually that's obviously one thing you can do what we
try to do now is with our monitoring data to actually then automate the the mitigation so
that means let's assume this happens at two o'clock in the morning that something went wrong.
Then the auto mitigation could say, hey, we are automatically turning off the toggle
because this page that is now slow was just toggled on with this new feature.
So let's toggle it off at 2 o'clock in the morning and let's not wake people up,
but they can come in at 8 o'clock in the morning and then
figure out why we had to toggle it off. So this is something we try to do now
with the data we collect and how we integrate with frameworks
that for instance turn on and off certain features.
I wanted to ask, and I think I know the answer
but I'm not quite sure because I was thinking about this while you all were talking.
And if we look at the microservice model, which I know I'm going there first because we're talking about starting this in more of a monolithic approach because that's typically where you have your legacy code, right?
But if we were talking about a microservice topology where you have anybody with their own service might be using your service, right?
One of the things you have to take care about is versioning and supporting backward compatibility.
Because if you make a change with your feature, let's say, you might break another service that's using you that's not expecting that change.
And there's all different kind of best practices in order to handle that.
But I don't know if that really applies. And I guess that's really the question change and there's all different kind of best practices in order to handle that but i don't know if that really applies and i guess that's really the question is does this apply when you're talking about feature toggles and something in more of a
monolithic code base i would assume maybe not because in your monolithic code you kind of know
everyone and everything that's talking to your feature so it's not like you can just have some people randomly spinning up
a new service. It's very well known what's accessing it. And any changes you make, you would,
I think, automatically know the impact and who's you who your consumers are, who the customers of
your feature are, that it would be a lot less of a risk. Do you think that's the case? Or
I don't know if this was only true you know if you if you start
with two or three services and it's two or three teams sitting next to each other it's easy to to
keep an eye on it if i think back to the auto auto environment where it's more than 300 services of
14 teams i wouldn't be sure that everybody really knows who's consuming what API at what point of time. So I also think
that there may be unintended changes. However, yeah. I was going to say, but in more of a monolithic
kind of code, like you're describing right here, does that come into play as much because you
suddenly don't have a bunch of different services or unknown amounts of teams interfacing because
it's all basically this monolithic code base um or am i am i thinking
about it completely wrong so when you when you're when you have a monolithic application where where
everything cross talks with everything else and you add a new feature somewhere if you add a new
functionality it's always easy to toggle right Because you shouldn't influence too much what is going on.
So the way we did it in those cases for bigger application
was to refactor the parts where we want to work on
before we would add new functionality
so that they would be more suited for a feature toggle.
Maybe you can also strip down an existing monolith
into some more domain-driven or in a domain design where you get to know more or less the peers you're talking to.
And then you can add it as well.
If refactoring doesn't necessarily need a feature toggle because a refactoring shouldn't change your functionality. You should just remove some of the spaghetti code in where you struggle to find the places
where you communicate to other teams
or other parts of the system.
But I'm trying to remember if we had any major problems
just exaggerately speaking,
just throwing a feature target somewhere
in the middle of the monolithic code base um rarely rarely has an introduction of a feature toggle caused massive problems for
other teams that were relying on one shared code base okay and and you also talk about the
permeation of testing so how do you um handle like is it or maybe the whole idea is to avoid it, but let's say you have 15 features that can be toggled on and off in a release or that are alive at the same time.
Is that a situation where you try, because I can imagine the test permutations of testing with features 1 and 2 on, 1, 2 and 3, or 7, 1 and 2,
become way too uncontrollable?
Is it more of a situation where the rule is you can only have a certain amount of feature toggles in a code base at a time
to be able to accommodate testing well?
I think it's a theoretical versus a very real-world pragmatic problem.
We discuss it all the time.
We discussed it here on our project just last week on Thursday.
So don't get me wrong.
Let's say you have 15 feature toggles.
Half of them are already switched on and half of them are not because they're still in development.
Two of them are long forgotten and all kinds of combinations, right?
But the next feature, the next feature toggle that will be toggled on is the next feature we are building, this messaging system,
I don't know what, that we want to toggle next week.
And it's a standalone release of this messaging system, right?
So it's not like feature toggles are toggled on and off all the time.
So we know pretty well what is currently toggled on in production.
And we have one test environment before we release
that has the same toggle configuration as our production environment.
So before we release the very last high-level and functional
and end-to-end test integration test environment
that has the same feature toggle configuration than our production environment.
So as long as we don't flip feature toggles, we have just one configuration that we have
to test.
We don't have to test permutations.
But now you may toggle on or off the one or the other features, but those rarely happen
random.
And if mostly by an error case where everything went wrong already,
so how do you test for everything is going to hell at the moment,
then you need to react anyway.
So you can do a little more harm, I would say.
And it's really rare.
And you test before you toggle.
We also test manually, right, what I said on this environment.
So we make sure that we only switch on the toggle on the last environment before live production.
And then we manually test the same configuration.
If we are a bit unsure, we can still run the same test suite again and make sure it runs.
And then we have all the toggles in production.
And just one is different to the other environment.
And then we go and say, okay, the feature is fine.
We toggle it on in production.
We have the same configuration again.
So this is what I mean by, in theory,
all toggles could be different,
but in practice, a flip of a feature toggle is occasionally,
and therefore you don't have to test all permutation at all points of time.
So you shouldn't have that business owner with tears in their eyes turning features on and off at will.
That should be much more planned.
Yeah, right.
Going home and in the evening just saying, oh, this looks nice.
I'll turn it on.
I think it's ready enough.
Yeah.
No, that shouldn't work this way.
A more disciplined and major team, as i said in the other episode another way if you're a bit unsure what you can also do
what worked well on another project was to toggle all features off and run the test and toggle all
features on and run the test because feature toggles are really new functionality and they shouldn't or they rarely change your core business or the core
user flow or the core of your product they are tiny bits and pieces that are added everywhere
so they will rarely break your your big functional tests unless you have a massive test suite that
has everything then you're doomed on many other in other ways. But if you have a very thin layer of end-to-end tests
that just test your core business process,
it usually isn't affected by those feature toggles.
So we toggle all off, let the test run, toggle all on,
let the test run, it's green both times, and we deploy it then.
We made, sounds a bit funny,
but we made really good experiences with that as well.
You know, it sounds like the description of this and we deployed them. We made, sounds a bit funny, but we made really good experiences with that as well.
You know, it sounds like the description of this sounds like it's a hack way
to be able to deploy features
in a microservice style to monolithic code
because typically you think of a monolithic code
as being large releases with a bunch of changes.
There's not too often a case where there's like a small release.
You know, like with microservices,
you can make a small change to one service, deploy it monolithic.
It's a lot more time consuming and a lot larger impacts.
But by putting in these feature flags,
you're effectively allowing for these little small changes to go in,
in that microservice spirit,
without actually breaking your monolith
down into microservices yet yeah you're decoupling bits and pieces of your application in order to
release them independently in a way but everything i said applies as much for feature targets for
microservices right even if you have microservices you don't want to ship a half-baked front-end
features yeah but that's a given in microservices. There's always this idea in monolithic that it's monolithic and everything's got to be
big.
This is a way to make small changes in monolithic in the spirit of microservices without actually
having broken your product into microservices yet.
So it's pretty cool.
It's a cool little hack around.
I think the bigger challenge for monolith usually is to get the pipeline fast enough
to make it worth the effort.
Many monoliths need a lot of hand-holding while being released and a lot of care while being shipped.
I think this is almost the bigger challenge than when you've done this to also add a few if statements or a toggles library, to be honest.
So I think this is even the bigger the
smaller the easier part with these feature flags cool hey this is a quite a nice excursion into
into feature flags and feature toggle and uh thanks for the insights is there did we answer
all the questions from the beginning is now i i think i i think we yeah, because we talked about, yeah, we definitely, I think you earn a badge.
So we may come up with some badges we use.
You can print it out and put it in your email signature.
Exactly.
Excellent. I know you mentioned toggles as the framework in any others, because, you know, it's great to name one, but maybe it's better to name multiple in case you have some other toggle frameworks that you've seen everywhere if there was a library. Otherwise, it was way, way,
way smaller custom-built
toggles because it was
smaller services that didn't require big
libraries. I'm sorry.
Yeah. That's cool.
Perfect. If you use it that heavily,
then I'm sure it's a great framework and people should
look into that.
It's a big and old one. It's
well-supported's it's really
doing doing the job so yeah perfect the only thing that i wanted to add to the monitoring aspect i
know we have some listeners that are obviously using our products we have a great feature that
maybe not many people know all know all people know about it especially when you do front-end
features if you are turning on a toggle and let's assume that toggle is obviously generating some new HTML to show that feature,
Dynatrace with our real user monitoring feature, our UEM, can actually capture this as metadata.
So we can actually say which users are actually exposed to which version or to which feature. And then we can
put them into categories and say, hey, how many people have been exposed to the version with the
feature toggle on versus feature toggle off, or this feature toggle and this feature toggle.
And then you can specifically monitor user behavior. So folks that are interested in,
we can capture metadata through uem through our
javascript agent and then we can uh we can analyze this so just an fyi for people for customers that
listening check it out so that's very very nice i can add that's very nice for for prototyping
right if you if you just throw the quick click dummy as a prototype you can just see if people
react at all and if it's worth spending two months to build the actual feature.
Exactly.
Very nice.
And the only other thing I would add to the monitoring there is to make sure you make an indicator that you've turned on a flat in your monitoring tool that you've turned on, toggled on a feature so that if other things do start going south you um you know oh we just turned on a
feature as well right exactly we can tell we can we can now use our rest api to tell dynatrace about
we just made this configuration change or deployment or in this case it would be a configuration change
right which it is right yeah and finn were you practicing clapping i heard a couple of this going on. Were you testing out the –
Was I?
No, no, no.
I don't know.
I don't know if you remember.
A year ago when we did the first episode, we laughed hard that I was sitting with my last name with beer in Germany after four and I didn't have a beer.
Ah, yes.
So this time I brought it.
Maybe I put it down on the table a bit too loud.
I'm sorry.
No, that's okay.
I was just referencing back to the previous episode when we were talking about so you know in order to have your beer make a sound you have
to have the table and the beer they have to work together exactly yeah that's perfect so now i have
the last question what's the brand what's the brewery berlina berg it's a new one and it's a
it's a little brewery that brews really nice beer, not the standard huge breweries, you know,
that all produce the same product,
that just slightly is different, but a beer with characters.
And the only brand we have in the fridge here at work at the moment.
But it's nice.
Oh, Prost.
Sorry?
I said Prost.
Prost, Prost.
Yes, Prost. Cheers. Na zdrowie. All right. sorry i said post post post yes post cheers
all right um i would say hey finn thanks again for doing the second episode i think it was very
insightful um and i think there's many other of these techniques that we should actually cover
in future episodes besides feature toggling i I think in the future we should talk about
blue-green deployments or Kineary releases.
There's all sorts of techniques that favor rapid deployments
or dark deployments, whatever you want to call it.
Cautious deployments even, yeah. Maybe, yes.
Yeah. So we should think about that.
And now you said you want to come back anyway once your project kind of evolves a little bit, and then we can talk about this, and then we should also think about other techniques.
Awesome.
Other than that, I say to summarize, it seems feature flags, there's a lot of great advice that you gave on how to test. Obviously, if you have multiple feature flags, you want to make sure you always test with what is currently in production and then turn the feature flag on that you want to turn on soon.
So please coordinate with your business analysts or with your product owners.
And I guess what I also learned is that feature flags don't stay around forever, but they eventually obviously become just the features that you toggle
become part of your standard features.
So you just basically remove the if statement,
which it basically is in the next deployments.
So that was interesting.
And yeah, monitoring is important.
Figure out the user behavior,
how users react to the new features
that they get exposed to.
And yeah, and start experimenting. I think that's also a cool thing start experimenting with with new features and use feature flex for that as well and all i've got to say is thank you this is i've
always i always enjoy our talks um stover is always overwhelmed with new information that i get to
think about and ponder upon so So it's always greatly appreciated.
So thank you once again for joining us.
Yeah, it was a very cool episode, I think.
No, that was a beautiful summary.
I mean, if I could have said this in those nice five sentences,
it would have been a two-minute episode.
Next time, maybe.
But then we would have
had all the nuance
that is where
you get the
learning from
but Andy
Andy is a
Andy's got a special skill
and one day
we'll see him
doing something
much grander with it
but at least
at least I have one
yeah at least I have
one special skill
well that and your dancing
your dancing is number two
right
yeah
that's maybe
yeah that's true alright All right, yes.
All right, well, thank you again.
It's been a pleasure to have you.
And we'll see who...
Thank you, too.
We'll have to see who becomes a member of the Three Timers Club first.
So, challenge is on, people.
Challenge accepted.
All right.
Thank you very much.
Bye-bye.
Bye-bye. Bye-bye.