In Part One, I outline how to improve Financial Security through Automation. The journey through automation is unique to each organization, there is no ‘One Size Fits All’ approach. However, there are several practices to avoid when embarking on this journey.
As a Software Developer for Intelliware, I engage with Software Development programs that are looking to adopt a DevOps culture and improve their deployment automation processes. In many large organizations, software teams do not have influence over the infrastructure, network topology, or security requirements within their organization. This post will look at the areas that are within the control of a development team.
Typical Pitfalls of Automation
Pitfall #1: Only automating between the handoffs
Sometimes, multiple teams are required for a particular workflow. A development team can only automate the tasks they perform within that workflow. However, in situations where the largest bottleneck is introduced by an external team, the automation does not provide significant benefits to the entire software delivery process.
Tasks external to the development team may include:
- Network firewall change requests, e.g. to open ports between application endpoints
new infrastructure, including:
- physical servers
- virtual machines
- namespaces in Container Schedulers
- Requests to deployment teams to schedule production deployments
Conducting a value-stream mapping exercise will identify the largest bottlenecks to the entire delivery process. If a formal value-stream mapping exercise is not possible, meet with downstream teams to identify the single largest bottleneck to reduce delivery time. Work together to eliminate the bottleneck, even if means postponing some feature development.
Pitfall #2: Not automating the verification
A development team has deployed their service in production with an automated deployment process. Many teams stop here. While this is a significant improvement to the previous process, manual verification is still required. A team member will standby during deployments, waiting idly for their turn to verify that the deployment was successful.
This manual verification step is not scalable for two reasons. First, very frequent deployments may require staff dedicated solely to verifying the deployments. Second, the productivity of the team declines if more and more time is dedicated to monitoring the increasingly frequent deployment schedule.
Frequent deployments require automated validation. When team members are no longer required to monitor the deployment and perform mundane smoke tests, they have more time to focus on building new features and improve customer value.
Start by creating detailed documentation of the verification steps that are performed. Make sure that all manual activities are recorded. Invest in automation software to perform these steps during future deployments. Personally, I prefer accessing web APIs directly using SoapUI or raw code instead of using screen recorders or Selenium WebDriver frameworks. For the first few executions, it’s perfectly reasonable to perform both the manual and automated verification to build trust in the automated solution.
Pitfall #3: Automating a bad process
Most web application teams I work with have some sort of UI test automation, typically using Selenium WebDriver. WebDriver-based automation tests are often over-engineered; they require the entire application to be running instead of stubbing out the back-end and running the UI in isolation. End-to-end testing of this nature creates a lot of complexity, and unnecessary latency.
WebDriver code is used to perform all steps of the test setup and execution, including:
- logging into the application
- creating test data, which often requires navigating through multiple screens
- performing the function under test
For each action performed in the UI, the test code must wait for the next page to load, or for each web request to return successfully. At each step, a page load may take longer than expected and the test may decide that too much time has passed, and fail due to a timeout. Anyone who has worked with end-to-end testing of this nature can likely relate to this scenario. The majority of end-to-end test suites fail often, and the failing test(s) in one run might complete successfully in the next.
This kind of automation doesn’t increase our confidence in the overall quality of our application!
Refer to Fowler for more information about the appropriate (small) ratio of integration tests to unit tests. Separate tests that validate GUI behaviour from the backend. When the backend is stubbed out, there is more control over the state of the GUI and less setup code is required. When performing integration/smoke tests with a fully functional backend, drive the tests with the API. This is a great way to verify that the API is complete, consistent, and is designed with testing in mind.
Pitfall #4: Not improving the overall cycle time
In Pitfall #1, we considered a situation in which a development team automates process they have complete control over, and avoids those that require coordination with external teams. If the overall cycle time has improved only marginally (or not at all) compared to other very obvious bottlenecks in the entire delivery process, the Theory of Constraints has been violated and the team’s efforts have been misplaced.
What does this look like?
Consider the following scenario. Each software deployment must be approved by a Change Advisory Board (CAB) that meets weekly on Tuesday. Items to be reviewed by the CAB must be submitted by Thursday of the previous week.
Now imagine that a development team has streamlined their deployment process. The manual process used to require an entire day, but now takes less than an hour. Every change made by the team can now be deployed one day faster than before. However, all changes must still wait up to 7 days until the next CAB meeting. The time saved allows the team to move onto a new piece of work, however this doesn’t equate to faster delivery. The next set of tasks are started sooner, but can encounter the same ‘unnecessary’ delay.
In this example, the primary bottleneck to delivery is the approval process. Software can be released at most once weekly in this scenario. The primary bottleneck was not addressed! Optimizations were made to shave hours off of the deployment process, that is otherwise measured in days. This kind of automation is underwhelming from the customer’s perspective.
I can’t stress enough the benefit of a value stream mapping exercise. Try to foster the attitude that the entire organization is responsible for delivering value. That might mean that a development team works with upstream (e.g. network & infrastructure) teams or downstream (operations or support) teams to improve cycle time. There may be an opportunity for development teams to build internal tools to improve the upstream or downstream processes to facilitate short cycle times.
Pitfall #5: Not providing value to the end user
If automation isn’t making users lives better, our efforts are lost. Development dashboards are a common example of automation that may not provide user value. Metrics and dashboards can be invaluable to driving customer value if we pay attention to them. However, when efforts are made to automate the collection of data, and the results are ignored, we have achieved nothing for our customers.
Project managers, Agile coaches, and DevOps evangelists often collect the following data, sometimes for no reason other than because we are told we should:
- burndown charts
- sprint velocity
- cycle time
- commit frequency
- branch age / long-lived branches
This data is compiled into dashboards, and too often the dashboards are not used as the baseline for future experimentation. The values may trend in the right direction, or drift in the wrong
direction without much intervention.
Choose your metrics carefully. Harness you inner Marie Kondo and ask yourself “does this bring your customers joy?”
Try to focus on metrics that are directly relevant to your business, such as sales figures, search hits, time spent on your commerce site and conversion rates. If you find that sales decreased after a particular feature goes live, perhaps you’ve introduced a bug or have degraded the user experience. This will tell you much more about your development efforts than knowing that the number of commits to your Git repo has declined from the previous week.
Automation is a great thing! We cannot march towards Continuous Delivery without it. However, the goal is Software Delivery, Automation is just a means to that end.