Infrastructure & Software Automation: When You Should Use It (And Why)

It’s no secret that I like automation, so much so that I wrote a book called Learn Ansible which, as you may have guessed, covers the basic usage Red Hats Ansible. While I try and automate as much as possible, there are a few basic rules I stick to and questions I ask myself when deciding which tasks I should spend time automating.

Probably the most crucial question is “why am I writing the automation in the first place?”. If the task I have been asked to automate is only ever going to be executed once or, if it is so bespoke that it could only ever be executed against a single host performing a single task, then the time would be better spent reviewing the back up strategy of the machine, any DR requirements and actions that can be taken to make the host as highly available as possible.

Now that might not sound like a very ‘DevOps’ approach but there is still a lot of value in what is now considered a more traditional way of working. As long as the host or service you are deploying is well documented, and you have a good (by that I mean well maintained and tested) backup and recovery strategy then not everything has to be all singing, all dancing, highly scalable and automated.

Now that I know I should be automating the task I need to think about the approach. This includes considering:

How should the automation be written?
- There should be nothing hardcoded within the automation which means it can’t be executed against multiple hosts – given the task we are looking at automating is this possible?
Who will be running the automation and what is their experience?
- Some tasks may be triggered by an event while others may need human intervention. In my previous blog post, Automating Deployments Using Ansible AWX & Jenkins, I gave a few examples of how we are triggering playbook runs internally at N4Stack.
Can parts of the automation be re-used?
- It is good to build up a collection of core functions which can be re-used and shared with other team members.
Do I need logs and output?
- Sending notifications to a Slack Channel is excellent for immediate feedback but where else do I need to send information for more long-term storage, does an issue need opening, updating and closing in Jira or ServiceNow for example for each job run?

Once the playbooks have been written and tested it’s time to make them live. Once they’re live and other people start using them is usually the point when the tweaking happens – typically there are always improvements to be made, either by myself or by the rest of the team. Changes and improvements are always welcome.

So why go to the effort of automating your tasks in the first place?

My biggest reason for wanting to automate infrastructure and the software stack is to introduce consistency – if I have to do the same task repeatedly then that task should be done in precisely the same way each time. This consistency makes the hosts/applications which have been deployed easier to support as everything ‘should’ have been initially deployed in the same way.

In the blog I mentioned earlier I explained that our development team do not have access to any of the hosts which are being managed by our playbooks. This means that other than the hosts running slightly different versions of the code base the software stack and configuration should be the same across all of the hosts. This approach also allows me to test my changes to configuration in a controlled way, just like the developers are with the dev, test and production branches of the applications.

Next up, time. There never seem to be enough hours in the day so taking the time to automate a single host means that if for some reason I am asked to launch 100 versions of that host suddenly then I can do so without having to manually work through the dreaded configuration/install spreadsheets that unfortunately are still all too commonplace.

Finally, automating as much of the infrastructure and software stack as possible removes me as a single point of failure – all of my playbooks are commented, checked into version control, and I am using the same tools as the rest of the team. This allows us to collaborate on the automation but also allows, if needed, someone to take over the project altogether.

How can this apply to you?

Unfortunately, the following meme, while funny, still rings true.

We have been working with customers who have expressed a need to take the production operating system/software stacks and configuration we manage for them to their local development machines. They need to be able to easily and consistently launch a local virtual machine which mirrors the stack we will be providing in production.

As each customer deployment has been configured for that one customer’s requirements this has to be a very collaborative process – not only do we need to understand that customers requirements, we also need to ensure that we do not end up painting our customer into any corners when it comes to future requirements.

Using established open source tools such as Ansible alongside Virtual Box, Vagrant and others mean that we can offer support for local development across a multitude of platforms and configuration.

For more information on how we can help and support you visit our DevOps (SRE) Managed Service page.

Russ McKendrick

Practice Manager (SRE & DevOps)

Russ heads up the SRE & DevOps team here at N4Stack.

He's spent almost 25 years working in IT and related industries and currently works exclusively with Linux.

When he's not out buying way too many records, Russ loves to write and has now published six books.

To find out more about Russ click here!