The Good Tech Companies - Deployments: The Irrational Fear of Them

Starting point is 00:00:00 This audio is presented by Hacker Noon, where anyone can learn anything about any technology. Deployments. The irrational fear of them, by Aviator. Greater than the anxiety of deployments is real. Let's take a stab at understanding the greater than human emotions related to deployment and learn best practices to minimize the greater than fear. A recent outage involving CrowdStrike impacted 8.5 million Windows operating systems, leading to disruptions in various global services, including airline sand hospitals. Recent outage involving CrowdStrike impacted 8.5 million Windows operating systems, leading to disruptions in various global services, including airline sand hospitals. Multiple analyses have examined the root cause of this incident itself.

Starting point is 00:00:39 However, as a software engineer, I think we are missing the aspect of human emotions related to deployments, specifically the fear of breaking production. That's what we will try to dive into in this article. We will cover understanding the function of release engineering, what software engineers care about and what they don't, impact of continuous delivery, CD, a look at manual deployments, problems with manual deployment and the solution to these problems. Release engineering. Before delving into the fear of deployments from a software engineer's perspective, let's first understand the role of a release engineer. Release engineering has evolved considerably in recent years, thanks to the modern CI and CD tools and the standardization of Kubernetes. Despite these advancements, the primary responsibilities remain the same consistent and repeatable deployments. Standardizing release processes reduces the risk of bad

Starting point is 00:01:25 deployments to production. Reducing service disruptions. Standardized processes also ensure teams are equipped to tackle harmful production environment incidents. For example, a rollback strategy for scenarios where a release causes problems. Monitor and optimize performance. Look for performance improvements for faster and more reliable deployments. Collaborate with engineering. Work closely with developers, QA, and DevOps teams to ensure all new and existing services have a well-defined deployment process. What software engineers care about. Unlike the release engineers, as a software engineer working in the product team we may only care about certain aspects of deployments quick code merges. Merging quickly allows them to validate their work and move on to new tasks or unblock dependent tasks. Production incidents. Although engineers

Starting point is 00:02:14 may not care about all production incidents, they definitely care about their code changes causing any production outages. Deployment schedule. Engineers also like to track when their changes go live or have gone live so that they can have access to real-time feedback on their changes. What software engineers don't care about. Although there are things we care about, there are also those we don't deployment methodology. Although we know the need for an efficient and reliable deployment process, they don't care how it is performed. Effect of other changes. Unless things go wrong, we don't worry about unrelated changes from other developers. Deployment management. An engineer is

Starting point is 00:02:51 indifferent to who manages deployment in a software team. For instance, we would only care about managing deployment if tasked with doing so. Impact of continuous deployments, cd. Greater than so what does the fear have to do with continuous deployments, a lot. Studies have proven, several benefits, https colon slash slash dora. Dev, capabilities, continuous delivery, hash. Tilda. Text equals devops percent 20 research percent 20 and percent 20 assessment percent 20. DORA, as %20 higher %20 levels%20 of %20 availability of continuous deployment, CD, and unsurprisingly, many of which are psychological in nature. Continuous deployments remove human in the loop, therefore, it requires a strong trust in the test infrastructure. In other words, automated tests not only ensure the

Starting point is 00:03:45 reliability of production but also provide psychological safety, sometimes irrationally, reducing the fear of deployments. As a developer, I'm more comfortable making changes in a CD process versus if I'm asked to verify the changes manually. However, despite the popularity of these CD strategies, a lot of companies still trigger deployments manually, have a human in the loop, indicating a cautious approach to CD implementations. This behavior suggests that teams prefer to retain supervision of the release process and intervene where necessary. This is important to understand from a psychological safety perspective. Manual deployments imply that someone is overseeing the process and handling issues when things go wrong. While this provides a sense of security, it can also induce fear in

Starting point is 00:04:31 the person deploying and is prone to human error. Manual deployments. Despite the drawbacks, most teams manage deployments manually. A typical manual deployment may include a few steps. Supervision someone babysits the entire deployment process before a release goes out. This person is tasked with intervening when and if there are signs of trouble. Teams maintain an on-call person who manages their deployments and handles problems when they arise. Dedicated release teams. Some teams have a dedicated release engineering team which ensures releases go smoothly. Since this means a high degree of specialization, the deployment process could be more efficient and reliable.

Starting point is 00:05:10 SPREAD Sheets Some companies maintain a spreadsheet to validate any changes made. This allows companies to systematically review and approve these changes, ensuring they meet predefined quality standards. Manual QAIN Addition to Spreadsheets Manual QA in addition to spreadsheets, Manual QA is another layer companies add. Manual QA tests new releases in staging environments before deploying them to production. However, a testing environment isn't foolproof, so some real-life scenarios won't be accounted for. Where do things go wrong with manual deployments? Many things can go wrong for any software

Starting point is 00:05:44 development team relying solely on manual deployments. Dependence on a small group this can create bottlenecks, which lead to release delays and human error in some instances. Also, a team could have problems when this specific person leaves or can't deliver on the required tasks. No risk mitigation strategy There is no strategy for following through in an unfavorable production incident. When an incident happens, the release team has to grapple to find the relevant stakeholders to help resolve and make decisions. Prone to human ERROR typographical

Starting point is 00:06:16 errors in commands or scripts, or forgot to run the pre-deployment or post-deployment steps. High EF fort. Since the deployments require babysitting the process, it becomes a time-consuming effort. Also causing the frequency of deployments to drop significantly. For instance, if it requires an hour to monitor the entire deployment, the release team may decide to skip deployments on the days with minor changes to save that time. Communication breakdown, it's unclear from product teams the state of the releases and when their changes are getting into production. Looking at these challenges, it's easy to understand why engineers dread deployments. The risk of deployment failures, the high stakes,

Starting point is 00:06:55 and the pressure to keep downtime low also contribute to this fear. These failures can be minimized by increasing test automation. Still, since these tests are carried out in a test environment, you should not expect an automated test to catch every possible error. Failures are to be expected but at a reduced rate. What can we do about it? Simply set up continuous deployments? Easier said than done. Despite the drawbacks, manual deployments are still okay if managed well. The goal should be provide guardrails to avoid production incidents. Reduce human errors. Enable anyone to trigger deploys. Ensure deployments happen frequently. Guardrails, canary and rollbacks. Canary and rollback strategies can

Starting point is 00:07:36 help reduce the impact of an outage and in many cases avert the crisis automatically. A canary release exposes your new release to a small portion of production environment traffic. This gives teams insight into issues that might not have come up during testing. On the other hand, a rollback strategy helps engineers revert a release to its previous stable version state. It is done when new problems arise after deployments to the production environment. Reduce human errors, standardization defines standard deployment methodologies that result in efficiency, consistency, reliability, and high software quality. In their State of DevOps report, DORA shows that reliability predicts better operational performance.

Starting point is 00:08:17 Furthermore, having a standardized process allows repeatability in release processes, which can be automated. Automating this process helps a team keep production costs lower. Democratize deployment process Democratizing the deployment process removes the reliance on specific individuals. If we empower any software engineer to deploy, it slowly reduces the fear. If anyone can deploy, it should not be too hard. Share your Legos. Frequent deployments to reduce deployment anxiety. We need to deploy more frequently, not less. The DORA report also highlights that smaller

Starting point is 00:08:51 batch deployments are less likely to cause issues and help lower the psychological barrier for developers. Improved developer experience clarifying what is being deployed enhances the developer experience. Make IT easy for developers to know when deployments occur and what changes are included. This transparency helps developers track when their changes go live and simplifies incident investigations. Defined risk mitigation strategies. There should be defined steps to follow for rollbacks and hotfixes as this helps eliminate any indecision with production incidents. For instance, there should be separate build and deploy steps for teams to follow for easy rollbacks.

Starting point is 00:09:35 Similarly, standardizing how to deal with hotfixes and cherrypicks can make it simple to operate when the stakes are high. Feature flags Feature flags are like kill switches that can turn off a new feature that caused an incident in production. This can enable engineers to resolve production incidents quickly. Conclusion. Software teams must treat release engineering as a priority from the outset of product development to avoid costly mistakes. And we should not let incidents like the CrowdStrike outage cripple our development practices. Addressing the fear of deployment and preventing production incidents involves several key strategies invest in the standardization of deployment processes set up well-defined risk mitigating strategies such as canary releases strategic rollouts rollbacks and hotfixes simplify the developer experience by democratizing deployments and encourage everyone to participate at aviator

Starting point is 00:10:22 we are building developer productivity tools from first-principle STO empowered developers to build faster and better. For a modern way to manage deployments, check out Aviator releases. Thank you for listening to this Hackernoon story, read by Artificial Intelligence. Visit hackernoon.com to read, write, learn and publish.

The Good Tech Companies - Deployments: The Irrational Fear of Them

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.