PurePerformance - 053 Feature Toggles: Implementing, Testing and Monitoring them!

Episode Date: January 15, 2018

Feature Toggles or Feature Flags are not new – but they are a hot topic as they allow safer and fearless continuous delivery. Finn Lorbeer ( https://twitter.com/finnlorbeer ) gives us technical insi...ght into how he has been implementing feature toggles in projects he was involved over the last years. We learn why he loves https://github.com/heartysoft/togglez, how to test feature toggles, monitor the impact of features being toggled and how to make sure you don’t end up in a toggle mess.

Transcript
Discussion (0)
Starting point is 00:00:00 It's time for Pure Performance. Get your stopwatches ready. It's time for Pure Performance with Andy Grabner and Brian Wilson. Hello, everybody, and we're back with part two of our conversation with Finn Lorbeer, which, no, does not count as another episode for him to be in the three-timer club yet. But he's getting close. Andy, we wanted to continue our conversation on another topic. Why don't you introduce that topic topic and let's dive into it. Yeah. So, Finn, in your previous talk, you talked about feature flags or feature toggles.
Starting point is 00:00:52 And I know I hear this more and more, and we use it internally at Dynatrace as well. But I wanted to get your thoughts on how do you start with feature toggle? And is it possible on an existing code base and then start feature toggling? Or do you think it is easier and better to think about feature toggles when you build something completely new? And the other question is a couple of questions that I have. So kind of what are your thoughts on how to get started? And the other thing is how do you manage feature toggles? Because I assume the longer you build and work on a project the
Starting point is 00:01:25 more feature toggles you will have so eventually will you how do you consolidate will feature toggles at some point go away because they become that you don't need to toggle them anymore because they're just part of the of the default set and how do you monitor features on versus feature off so i know there's a lot of questions um but maybe let's get started with you have to remember all of them you have to remember all of them and if you don't we're not going to have you back exactly maybe we'll get sidetracked along the way as well so let's yeah i think it's easier to start on an existing project to be honest if you start on a green field you if you really start fresh,
Starting point is 00:02:05 you have no users in the first place anyway, and there's no need to toggle any features. You just want to get your very first version of the product out. So unless you have users, it's not worth it to work with feature toggles. And if you really build a small application where every story adds a huge part of functionality,
Starting point is 00:02:27 I think it's also very difficult to toggle because any story will touch a really big part of the application. Then it's a really big task to just build in the feature toggle. If you have a big, I don't know, legacy application that is four years old, thousands and thousands of lines of code code and you just know your little corner where you work on and there inside this little corner you build a small additional functionality then i think it's much easier to toggle just in the environment where you're in if you want to to just make sure that your changes that your change or whatever you write
Starting point is 00:03:05 is that code in production for now. That's the idea, right? And once you ship your feature, so the way we do it, or the way we always try to do it, while you develop it, the feature toggle is on on your local machine, obviously, on some test environment,
Starting point is 00:03:20 and it's switched off in production so that users don't see half-baked features where styling is missing or some elements in the front end is missing. Anything that looks weird, we don't want to show them yet. This is all we want to achieve here. So as soon as we're done with a story or with a couple of stories that are together a feature that we want to toggle on, we toggle it on, we observe it, and the day later we should take the toggle out usually unless there
Starting point is 00:03:45 is some reason to keep it in so this is how we try to reduce the amount of toggles the story is not really done or the the conglomerate of stories is not really done if the feature toggle is still in there okay and how do you so technically are you typically seeing that you write your own feature toggle framework so do you use frameworks that you write your own feature toggle framework? So do you use frameworks that are out there? And what frameworks can you recommend or have you worked with? The one we work most with is the toggles with a Z in the end, T-O-G-G-L-E-Z, a library written in Java. We also embedded it into our Clojure application for now.
Starting point is 00:04:25 It has a really nice front end. And there comes another beauty of feature targets. You can go to your product owner in the end and say, the feature is finished. Do you want to deliver it to your users? Do you want to hit the button that releases? And they are super glad. It's like they have tears in their eyes for joy.
Starting point is 00:04:42 And this is what this toggles framework gives you for free. We didn't manage to get this enclosure um but it's it's a very nice start i've also seen um people building it on their own or on the other hand if you have a small project if you just have one or two things that you want to toggle don't don't start with a framework just it's a simple a toggle is nothing but an if statement um if you want in the beginning. And you can also in the very beginning, what I always want to do is to decouple deployment from feature toggles. So I want to deploy at some point and at some other random point,
Starting point is 00:05:14 I want to flip feature toggle. But in the beginning, when you have just one or two things that you may want to show or not, just make it an if statement, global variable that is on or off and and then you have a feature toggle already you don't need anything fancy for it it's more the concept and thinking of how you deliver your software using this toggle that is that is the challenging part and only when you have five teams over two continents that work on the same product, then you need to find a way that the pieces of work that you identified and the areas or code bases or parts
Starting point is 00:05:51 that the teams or pairs touch are decoupled enough so that the feature targets don't influence each other. Because if they influence each other, testing becomes a nightmare because you can't test all permutations of all targets that you have in your code base then. So then that brings me to two more questions. The first thing is how do you test for correct toggling? So that means if you say essentially a toggle is like an if statement
Starting point is 00:06:17 around the code block, how can you make sure that the developers don't forget to put the if statement around the code block that is part of that feature especially if a feature is distributed across different classes and throughout the code base is there any best practice anything messy chocolate for once yeah in fact so what works really well really well i've tried it three times in three different teams and it worked each time is to give chocolate for well-implemented feature toggles at the end of the story just in the stand-up and just occasionally right if you're in the stand-up you're talking and the story was done and you have a feature toggle and yeah it worked well and everything was nice you just say say, oh, I want a chocolate.
Starting point is 00:07:05 And then what do we do? And everybody else is looking. So it's really mean, but it works. So that's obviously one part. It's such a kindergarten approach. It's amazing. Yeah, yeah, yeah. But keep it simple.
Starting point is 00:07:23 And the other thing is for backend toggles, it's more difficult. I don't have the strategy yet. For frontend, it's much easier because if you don't toggle your frontend properly, you will have changes in your frontend in any deploy. So what you can do is to make a pixel comparison of your product before you ship it. What I always want to do and what we will implement here now that we just went live, that we just start with this proper feature toggling, the next step for us here as well will be to have visual regression testing, but automatically so that we don't need the tester to look. We take a snapshot of our product, of the website, let's say of the start page,
Starting point is 00:08:01 immediately before we release because we know there is no feature yet there is no feature toggle yet this is how the website should look like without anything changing then we deploy the first commit which is feature toggled hopefully so afterwards the website should still look the same we take a snapshot afterwards and we compare the two of them they should be the same we trash them there's no problem in archiving, no problem if Firefox versions upgrade. In the meantime, it's just take a snapshot, wait five minutes for the deploy or less, take another snapshot.
Starting point is 00:08:32 If you have some things that sneak out, you will have front-end changes and then you detect them. If everything moves down by 10 pixel, you will find it immediately what a manual tester would never see. That's the strategy for front-end. Yeah.
Starting point is 00:08:51 And so then you said, you know, once a feature is, you know, you turn on the feature, you let it run, and then eventually, what you said, feature toggle should be removed once we know this thing is stable. That means you just then do, as part of a deployment, you just remove the if statements around the toggle. That's basically it. It just becomes part of the next deployment. And then the if statement that returns true anyway, because the toggle was on, will just be removed and the code runs just as before. Exactly. anyway because the toggle was on will just be removed and the code runs just as before exactly and in the snapshot comparison it also looks the same because it switched on before it still switched on afterwards just with the with the code parts that are in that anyway by now
Starting point is 00:09:35 removed and this should also match and this works then really well if you if you start to confuse feature toggles for ab testing frameworks or if you start confusing feature toggles as self-healing systems, then you will end up with a mess of feature toggles. So you have to be a bit strict or disciplined about when you use what and if you want to introduce a new framework to achieve something else but feature toggling for development. And how do you monitor what's the best way to any any approaches to monitor or well let me ask you this question first because then monitoring becomes easier to answer uh do you normally turn on the feature for everyone or just for a portion where it then becomes more like ab testing meaning do you turn it on for every user or that's the canary release you're talking about right just some users see how it behaves um until now i have to say we didn't need that yet i haven't been on a project i know that you can do it i know the charm of it um we didn't need it yet because
Starting point is 00:10:44 the features were stable enough that we shipped of it and we didn't need it yet because the features were stable enough that we shipped so far we we didn't have performance problems um or if you have performance problems you usually don't see them with the first three users at the moment we have 20 so we switch it on for them anyway um i i didn't need it simply so far it's it's a nice concept but if you don't need it you don't have to to introduce it right away for the monitoring part um if you if you flip a feature toggle so every commit every deployment is fully automated tested before you release a feature toggle that's a totally new functionality and you probably want to test this manually exploratory and very serious with a product owner and maybe some devs even.
Starting point is 00:11:30 And me as a QA, then I'm super, super curious how the entire software behaves and how I can break it. That's still part of my job, just not my only job. And then the feature toggle is toggling a new feature on or new functionality. It's not an everyday event, right? That's maybe every week if you're really fast or even less. So that's something special. And then you monitor manually. You look on your dashboards. You look on your database requests.
Starting point is 00:11:56 You look on your response times. You look on your error rate and everything. And if you see this rising up, then you toggle it off again. And if you see everything is fine, you leave the toggle. If it's still fine the next day, it's probably good enough to be removed. Well, I mean, you're right. You probably want to look at the – especially user behavior, right? But if you expose real users to a feature that you have tested well enough and you think it's perfect, your real user may still think differently about it.
Starting point is 00:12:23 So that's why i'm watching these metrics makes a lot of sense i what we try to do now and i know you said you're watching performance and failure rate and then toggle it back manually that's obviously one thing you can do what we try to do now is with our monitoring data to actually then automate the the mitigation so that means let's assume this happens at two o'clock in the morning that something went wrong. Then the auto mitigation could say, hey, we are automatically turning off the toggle because this page that is now slow was just toggled on with this new feature. So let's toggle it off at 2 o'clock in the morning and let's not wake people up,
Starting point is 00:13:04 but they can come in at 8 o'clock in the morning and then figure out why we had to toggle it off. So this is something we try to do now with the data we collect and how we integrate with frameworks that for instance turn on and off certain features. I wanted to ask, and I think I know the answer but I'm not quite sure because I was thinking about this while you all were talking. And if we look at the microservice model, which I know I'm going there first because we're talking about starting this in more of a monolithic approach because that's typically where you have your legacy code, right? But if we were talking about a microservice topology where you have anybody with their own service might be using your service, right?
Starting point is 00:13:48 One of the things you have to take care about is versioning and supporting backward compatibility. Because if you make a change with your feature, let's say, you might break another service that's using you that's not expecting that change. And there's all different kind of best practices in order to handle that. But I don't know if that really applies. And I guess that's really the question change and there's all different kind of best practices in order to handle that but i don't know if that really applies and i guess that's really the question is does this apply when you're talking about feature toggles and something in more of a monolithic code base i would assume maybe not because in your monolithic code you kind of know everyone and everything that's talking to your feature so it's not like you can just have some people randomly spinning up a new service. It's very well known what's accessing it. And any changes you make, you would, I think, automatically know the impact and who's you who your consumers are, who the customers of
Starting point is 00:14:37 your feature are, that it would be a lot less of a risk. Do you think that's the case? Or I don't know if this was only true you know if you if you start with two or three services and it's two or three teams sitting next to each other it's easy to to keep an eye on it if i think back to the auto auto environment where it's more than 300 services of 14 teams i wouldn't be sure that everybody really knows who's consuming what API at what point of time. So I also think that there may be unintended changes. However, yeah. I was going to say, but in more of a monolithic kind of code, like you're describing right here, does that come into play as much because you suddenly don't have a bunch of different services or unknown amounts of teams interfacing because
Starting point is 00:15:20 it's all basically this monolithic code base um or am i am i thinking about it completely wrong so when you when you're when you have a monolithic application where where everything cross talks with everything else and you add a new feature somewhere if you add a new functionality it's always easy to toggle right Because you shouldn't influence too much what is going on. So the way we did it in those cases for bigger application was to refactor the parts where we want to work on before we would add new functionality so that they would be more suited for a feature toggle.
Starting point is 00:15:58 Maybe you can also strip down an existing monolith into some more domain-driven or in a domain design where you get to know more or less the peers you're talking to. And then you can add it as well. If refactoring doesn't necessarily need a feature toggle because a refactoring shouldn't change your functionality. You should just remove some of the spaghetti code in where you struggle to find the places where you communicate to other teams or other parts of the system. But I'm trying to remember if we had any major problems just exaggerately speaking,
Starting point is 00:16:42 just throwing a feature target somewhere in the middle of the monolithic code base um rarely rarely has an introduction of a feature toggle caused massive problems for other teams that were relying on one shared code base okay and and you also talk about the permeation of testing so how do you um handle like is it or maybe the whole idea is to avoid it, but let's say you have 15 features that can be toggled on and off in a release or that are alive at the same time. Is that a situation where you try, because I can imagine the test permutations of testing with features 1 and 2 on, 1, 2 and 3, or 7, 1 and 2, become way too uncontrollable? Is it more of a situation where the rule is you can only have a certain amount of feature toggles in a code base at a time to be able to accommodate testing well?
Starting point is 00:17:40 I think it's a theoretical versus a very real-world pragmatic problem. We discuss it all the time. We discussed it here on our project just last week on Thursday. So don't get me wrong. Let's say you have 15 feature toggles. Half of them are already switched on and half of them are not because they're still in development. Two of them are long forgotten and all kinds of combinations, right? But the next feature, the next feature toggle that will be toggled on is the next feature we are building, this messaging system,
Starting point is 00:18:11 I don't know what, that we want to toggle next week. And it's a standalone release of this messaging system, right? So it's not like feature toggles are toggled on and off all the time. So we know pretty well what is currently toggled on in production. And we have one test environment before we release that has the same toggle configuration as our production environment. So before we release the very last high-level and functional and end-to-end test integration test environment
Starting point is 00:18:42 that has the same feature toggle configuration than our production environment. So as long as we don't flip feature toggles, we have just one configuration that we have to test. We don't have to test permutations. But now you may toggle on or off the one or the other features, but those rarely happen random. And if mostly by an error case where everything went wrong already, so how do you test for everything is going to hell at the moment,
Starting point is 00:19:13 then you need to react anyway. So you can do a little more harm, I would say. And it's really rare. And you test before you toggle. We also test manually, right, what I said on this environment. So we make sure that we only switch on the toggle on the last environment before live production. And then we manually test the same configuration. If we are a bit unsure, we can still run the same test suite again and make sure it runs.
Starting point is 00:19:41 And then we have all the toggles in production. And just one is different to the other environment. And then we go and say, okay, the feature is fine. We toggle it on in production. We have the same configuration again. So this is what I mean by, in theory, all toggles could be different, but in practice, a flip of a feature toggle is occasionally,
Starting point is 00:20:04 and therefore you don't have to test all permutation at all points of time. So you shouldn't have that business owner with tears in their eyes turning features on and off at will. That should be much more planned. Yeah, right. Going home and in the evening just saying, oh, this looks nice. I'll turn it on. I think it's ready enough. Yeah.
Starting point is 00:20:21 No, that shouldn't work this way. A more disciplined and major team, as i said in the other episode another way if you're a bit unsure what you can also do what worked well on another project was to toggle all features off and run the test and toggle all features on and run the test because feature toggles are really new functionality and they shouldn't or they rarely change your core business or the core user flow or the core of your product they are tiny bits and pieces that are added everywhere so they will rarely break your your big functional tests unless you have a massive test suite that has everything then you're doomed on many other in other ways. But if you have a very thin layer of end-to-end tests that just test your core business process,
Starting point is 00:21:11 it usually isn't affected by those feature toggles. So we toggle all off, let the test run, toggle all on, let the test run, it's green both times, and we deploy it then. We made, sounds a bit funny, but we made really good experiences with that as well. You know, it sounds like the description of this and we deployed them. We made, sounds a bit funny, but we made really good experiences with that as well. You know, it sounds like the description of this sounds like it's a hack way to be able to deploy features
Starting point is 00:21:33 in a microservice style to monolithic code because typically you think of a monolithic code as being large releases with a bunch of changes. There's not too often a case where there's like a small release. You know, like with microservices, you can make a small change to one service, deploy it monolithic. It's a lot more time consuming and a lot larger impacts. But by putting in these feature flags,
Starting point is 00:21:56 you're effectively allowing for these little small changes to go in, in that microservice spirit, without actually breaking your monolith down into microservices yet yeah you're decoupling bits and pieces of your application in order to release them independently in a way but everything i said applies as much for feature targets for microservices right even if you have microservices you don't want to ship a half-baked front-end features yeah but that's a given in microservices. There's always this idea in monolithic that it's monolithic and everything's got to be big.
Starting point is 00:22:29 This is a way to make small changes in monolithic in the spirit of microservices without actually having broken your product into microservices yet. So it's pretty cool. It's a cool little hack around. I think the bigger challenge for monolith usually is to get the pipeline fast enough to make it worth the effort. Many monoliths need a lot of hand-holding while being released and a lot of care while being shipped. I think this is almost the bigger challenge than when you've done this to also add a few if statements or a toggles library, to be honest.
Starting point is 00:23:04 So I think this is even the bigger the smaller the easier part with these feature flags cool hey this is a quite a nice excursion into into feature flags and feature toggle and uh thanks for the insights is there did we answer all the questions from the beginning is now i i think i i think we yeah, because we talked about, yeah, we definitely, I think you earn a badge. So we may come up with some badges we use. You can print it out and put it in your email signature. Exactly. Excellent. I know you mentioned toggles as the framework in any others, because, you know, it's great to name one, but maybe it's better to name multiple in case you have some other toggle frameworks that you've seen everywhere if there was a library. Otherwise, it was way, way,
Starting point is 00:24:06 way smaller custom-built toggles because it was smaller services that didn't require big libraries. I'm sorry. Yeah. That's cool. Perfect. If you use it that heavily, then I'm sure it's a great framework and people should look into that.
Starting point is 00:24:22 It's a big and old one. It's well-supported's it's really doing doing the job so yeah perfect the only thing that i wanted to add to the monitoring aspect i know we have some listeners that are obviously using our products we have a great feature that maybe not many people know all know all people know about it especially when you do front-end features if you are turning on a toggle and let's assume that toggle is obviously generating some new HTML to show that feature, Dynatrace with our real user monitoring feature, our UEM, can actually capture this as metadata. So we can actually say which users are actually exposed to which version or to which feature. And then we can
Starting point is 00:25:06 put them into categories and say, hey, how many people have been exposed to the version with the feature toggle on versus feature toggle off, or this feature toggle and this feature toggle. And then you can specifically monitor user behavior. So folks that are interested in, we can capture metadata through uem through our javascript agent and then we can uh we can analyze this so just an fyi for people for customers that listening check it out so that's very very nice i can add that's very nice for for prototyping right if you if you just throw the quick click dummy as a prototype you can just see if people react at all and if it's worth spending two months to build the actual feature.
Starting point is 00:25:47 Exactly. Very nice. And the only other thing I would add to the monitoring there is to make sure you make an indicator that you've turned on a flat in your monitoring tool that you've turned on, toggled on a feature so that if other things do start going south you um you know oh we just turned on a feature as well right exactly we can tell we can we can now use our rest api to tell dynatrace about we just made this configuration change or deployment or in this case it would be a configuration change right which it is right yeah and finn were you practicing clapping i heard a couple of this going on. Were you testing out the – Was I? No, no, no.
Starting point is 00:26:27 I don't know. I don't know if you remember. A year ago when we did the first episode, we laughed hard that I was sitting with my last name with beer in Germany after four and I didn't have a beer. Ah, yes. So this time I brought it. Maybe I put it down on the table a bit too loud. I'm sorry. No, that's okay.
Starting point is 00:26:42 I was just referencing back to the previous episode when we were talking about so you know in order to have your beer make a sound you have to have the table and the beer they have to work together exactly yeah that's perfect so now i have the last question what's the brand what's the brewery berlina berg it's a new one and it's a it's a little brewery that brews really nice beer, not the standard huge breweries, you know, that all produce the same product, that just slightly is different, but a beer with characters. And the only brand we have in the fridge here at work at the moment. But it's nice.
Starting point is 00:27:18 Oh, Prost. Sorry? I said Prost. Prost, Prost. Yes, Prost. Cheers. Na zdrowie. All right. sorry i said post post post yes post cheers all right um i would say hey finn thanks again for doing the second episode i think it was very insightful um and i think there's many other of these techniques that we should actually cover in future episodes besides feature toggling i I think in the future we should talk about
Starting point is 00:27:46 blue-green deployments or Kineary releases. There's all sorts of techniques that favor rapid deployments or dark deployments, whatever you want to call it. Cautious deployments even, yeah. Maybe, yes. Yeah. So we should think about that. And now you said you want to come back anyway once your project kind of evolves a little bit, and then we can talk about this, and then we should also think about other techniques. Awesome. Other than that, I say to summarize, it seems feature flags, there's a lot of great advice that you gave on how to test. Obviously, if you have multiple feature flags, you want to make sure you always test with what is currently in production and then turn the feature flag on that you want to turn on soon.
Starting point is 00:28:32 So please coordinate with your business analysts or with your product owners. And I guess what I also learned is that feature flags don't stay around forever, but they eventually obviously become just the features that you toggle become part of your standard features. So you just basically remove the if statement, which it basically is in the next deployments. So that was interesting. And yeah, monitoring is important. Figure out the user behavior,
Starting point is 00:29:01 how users react to the new features that they get exposed to. And yeah, and start experimenting. I think that's also a cool thing start experimenting with with new features and use feature flex for that as well and all i've got to say is thank you this is i've always i always enjoy our talks um stover is always overwhelmed with new information that i get to think about and ponder upon so So it's always greatly appreciated. So thank you once again for joining us. Yeah, it was a very cool episode, I think. No, that was a beautiful summary.
Starting point is 00:29:33 I mean, if I could have said this in those nice five sentences, it would have been a two-minute episode. Next time, maybe. But then we would have had all the nuance that is where you get the learning from
Starting point is 00:29:48 but Andy Andy is a Andy's got a special skill and one day we'll see him doing something much grander with it but at least
Starting point is 00:29:57 at least I have one yeah at least I have one special skill well that and your dancing your dancing is number two right yeah that's maybe
Starting point is 00:30:04 yeah that's true alright All right, yes. All right, well, thank you again. It's been a pleasure to have you. And we'll see who... Thank you, too. We'll have to see who becomes a member of the Three Timers Club first. So, challenge is on, people. Challenge accepted.
Starting point is 00:30:21 All right. Thank you very much. Bye-bye. Bye-bye. Bye-bye.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.