The Good Tech Companies - Deployments: The Irrational Fear of Them
Episode Date: September 30, 2024This story was originally published on HackerNoon at: https://hackernoon.com/deployments-the-irrational-fear-of-them. The anxiety of deployments is real. Let's take a st...ab at understanding the human emotions related to deployment and learn best practices to minimize the fear. Check more stories related to tech-stories at: https://hackernoon.com/c/tech-stories. You can also check exclusive content about #software-deployment, #cicd, #continuous-delivery, #continuous-integration, #release-management, #devops-culture, #developer-productivity, #good-company, and more. This story was written by: @aviator. Learn more about this writer by checking @aviator's about page, and for more stories, please visit hackernoon.com. The anxiety of deployments is real. Let's take a stab at understanding the human emotions related to deployment and learn best practices to minimize the fear.
Transcript
Discussion (0)
This audio is presented by Hacker Noon, where anyone can learn anything about any technology.
Deployments. The irrational fear of them, by Aviator.
Greater than the anxiety of deployments is real. Let's take a stab at understanding the greater
than human emotions related to deployment and learn best practices to minimize the greater than fear.
A recent outage involving CrowdStrike impacted 8.5 million Windows operating systems,
leading to disruptions in various global services, including airline sand hospitals. Recent outage involving CrowdStrike impacted 8.5 million Windows operating systems,
leading to disruptions in various global services, including airline sand hospitals.
Multiple analyses have examined the root cause of this incident itself.
However, as a software engineer, I think we are missing the aspect of human emotions related to deployments, specifically the fear of breaking production. That's what we will try to dive into
in this article. We will cover understanding the function of release engineering, what software engineers
care about and what they don't, impact of continuous delivery, CD, a look at manual deployments,
problems with manual deployment and the solution to these problems. Release engineering. Before
delving into the fear of deployments from a software engineer's perspective, let's first understand the role of a release engineer. Release engineering has
evolved considerably in recent years, thanks to the modern CI and CD tools and the standardization
of Kubernetes. Despite these advancements, the primary responsibilities remain the same
consistent and repeatable deployments. Standardizing release processes reduces the risk of bad
deployments to production. Reducing service disruptions. Standardized processes also ensure
teams are equipped to tackle harmful production environment incidents. For example, a rollback
strategy for scenarios where a release causes problems. Monitor and optimize performance.
Look for performance improvements for faster and more reliable deployments. Collaborate with engineering. Work closely with developers, QA, and DevOps teams
to ensure all new and existing services have a well-defined deployment process.
What software engineers care about. Unlike the release engineers, as a software engineer working
in the product team we may only care about certain aspects of deployments quick code merges. Merging quickly allows them to validate their work and move on
to new tasks or unblock dependent tasks. Production incidents. Although engineers
may not care about all production incidents, they definitely care about their code changes
causing any production outages. Deployment schedule. Engineers also like to track when
their changes go live or have gone
live so that they can have access to real-time feedback on their changes. What software engineers
don't care about. Although there are things we care about, there are also those we don't
deployment methodology. Although we know the need for an efficient and reliable deployment process,
they don't care how it is performed. Effect of other changes. Unless things go wrong,
we don't worry about unrelated changes from other developers. Deployment management. An engineer is
indifferent to who manages deployment in a software team. For instance, we would only care about
managing deployment if tasked with doing so. Impact of continuous deployments, cd. Greater than so
what does the fear have to do with continuous deployments, a lot. Studies have proven, several benefits, https colon slash slash dora.
Dev, capabilities, continuous delivery, hash. Tilda. Text equals devops percent 20 research
percent 20 and percent 20 assessment percent 20. DORA, as %20 higher %20 levels%20 of %20 availability
of continuous deployment, CD, and unsurprisingly, many of which are psychological in nature.
Continuous deployments remove human in the loop, therefore, it requires a strong trust in the test
infrastructure. In other words, automated tests not only ensure the
reliability of production but also provide psychological safety, sometimes irrationally,
reducing the fear of deployments. As a developer, I'm more comfortable making changes in a CD
process versus if I'm asked to verify the changes manually. However, despite the popularity of these
CD strategies, a lot of companies still trigger deployments manually, have a human in the loop, indicating a cautious approach to CD implementations.
This behavior suggests that teams prefer to retain supervision of the release process and
intervene where necessary. This is important to understand from a psychological safety perspective.
Manual deployments imply that someone is overseeing the process and handling
issues when things go wrong. While this provides a sense of security, it can also induce fear in
the person deploying and is prone to human error. Manual deployments. Despite the drawbacks,
most teams manage deployments manually. A typical manual deployment may include a few steps.
Supervision someone babysits the entire deployment process
before a release goes out. This person is tasked with intervening when and if there are signs of
trouble. Teams maintain an on-call person who manages their deployments and handles problems
when they arise. Dedicated release teams. Some teams have a dedicated release engineering team
which ensures releases go smoothly. Since this means a high degree of
specialization, the deployment process could be more efficient and reliable.
SPREAD Sheets Some companies maintain a spreadsheet to validate any changes made.
This allows companies to systematically review and approve these changes,
ensuring they meet predefined quality standards.
Manual QAIN Addition to Spreadsheets Manual QA in addition to spreadsheets,
Manual QA is another layer companies add. Manual QA tests new releases in staging environments
before deploying them to production. However, a testing environment isn't foolproof,
so some real-life scenarios won't be accounted for.
Where do things go wrong with manual deployments? Many things can go wrong for any software
development team relying solely on manual
deployments. Dependence on a small group this can create bottlenecks, which lead to release
delays and human error in some instances. Also, a team could have problems when this
specific person leaves or can't deliver on the required tasks.
No risk mitigation strategy There is no strategy for following
through in an unfavorable
production incident. When an incident happens, the release team has to grapple to find the
relevant stakeholders to help resolve and make decisions. Prone to human ERROR typographical
errors in commands or scripts, or forgot to run the pre-deployment or post-deployment steps.
High EF fort. Since the deployments require babysitting the process,
it becomes a time-consuming effort. Also causing the frequency of deployments to drop significantly.
For instance, if it requires an hour to monitor the entire deployment, the release team may decide
to skip deployments on the days with minor changes to save that time. Communication breakdown,
it's unclear from product teams the state of the releases and
when their changes are getting into production. Looking at these challenges, it's easy to
understand why engineers dread deployments. The risk of deployment failures, the high stakes,
and the pressure to keep downtime low also contribute to this fear. These failures can
be minimized by increasing test automation. Still, since these tests are carried out in a
test environment, you should not expect an automated test to catch every possible error.
Failures are to be expected but at a reduced rate. What can we do about it? Simply set up
continuous deployments? Easier said than done. Despite the drawbacks, manual deployments are
still okay if managed well. The goal should be provide guardrails to avoid
production incidents. Reduce human errors. Enable anyone to trigger deploys. Ensure deployments
happen frequently. Guardrails, canary and rollbacks. Canary and rollback strategies can
help reduce the impact of an outage and in many cases avert the crisis automatically.
A canary release exposes your new release to a small portion of production environment traffic.
This gives teams insight into issues that might not have come up during testing.
On the other hand, a rollback strategy helps engineers revert a release to its previous
stable version state. It is done when new problems arise after deployments to the
production environment. Reduce human errors, standardization defines standard deployment
methodologies that result in efficiency, consistency, reliability, and high software quality.
In their State of DevOps report, DORA shows that reliability predicts better operational performance.
Furthermore, having a standardized process allows repeatability in release processes,
which can be automated. Automating this process helps a team
keep production costs lower. Democratize deployment process
Democratizing the deployment process removes the reliance on specific individuals.
If we empower any software engineer to deploy, it slowly reduces the fear.
If anyone can deploy, it should not be too hard. Share your Legos.
Frequent deployments to reduce deployment
anxiety. We need to deploy more frequently, not less. The DORA report also highlights that smaller
batch deployments are less likely to cause issues and help lower the psychological barrier for
developers. Improved developer experience clarifying what is being deployed enhances
the developer experience. Make IT easy for developers to know
when deployments occur and what changes are included. This transparency helps developers
track when their changes go live and simplifies incident investigations. Defined risk mitigation
strategies. There should be defined steps to follow for rollbacks and hotfixes as this helps
eliminate any indecision with production incidents. For instance,
there should be separate build and deploy steps for teams to follow for easy rollbacks.
Similarly, standardizing how to deal with hotfixes and cherrypicks can make it simple to operate when the stakes are high. Feature flags Feature flags are like kill switches that can turn off a new
feature that caused an incident in production. This can enable engineers to resolve
production incidents quickly. Conclusion. Software teams must treat release engineering as a priority
from the outset of product development to avoid costly mistakes. And we should not let incidents
like the CrowdStrike outage cripple our development practices. Addressing the fear of deployment and
preventing production incidents involves several key strategies invest in the standardization of deployment processes set up well-defined risk mitigating strategies
such as canary releases strategic rollouts rollbacks and hotfixes simplify the developer
experience by democratizing deployments and encourage everyone to participate at aviator
we are building developer productivity tools from first-principle
STO empowered developers to build faster and better. For a modern way to manage deployments,
check out Aviator releases. Thank you for listening to this Hackernoon story,
read by Artificial Intelligence. Visit hackernoon.com to read, write, learn and publish.