How to succeed at infrastructure automation

4 minute read

I care a lot about infrastructure automation and the art of turning infrastructure into code. I find pleasure in using and developing build, test, and deployment systems. It’s what I get paid to do every day and can’t stop doing after work. In a sense, I’m obsessed with automation.

For an engineer like me, the questions of what to automate and how to go about it are of particular interest. Alas, I don’t always have the right answer to those questions. That’s when things go wrong and mistakes happen.

Fortunately, mistakes are also a wonderful opportunity to learn – and to eventually succeed.

Screwing up

I’ve been automating tasks of one kind or another for about a decade now. It goes without saying that I made some rookie mistakes along the way. I don’t mean blunders like pushing buggy code or executing commands in the wrong environment (oops). These things are unavoidable. What I mean are more fundamental problems that go beyond mere technical matters.

Let me share three personal anecdotes about screwing up in one way or another:

I once spent three full days writing automated tests for a shell script that performs backups. I wanted to change the script – a beast of hundreds of lines of messy code – without breaking the backup process. After wasting a couple hours trying to tame the test framework to do what I wanted it to do (how hard could it be?), I already knew that this wasn’t going to end well. While I eventually managed to write the tests, they ended up being very brittle and verbose, adding only little confidence. Even worse – and hindsight bias aside – at no time did those three days feel like adding any value. It was just my ego pushing me.

The first configuration management system I learned to use – and still the one I like the most – is Chef. I have a thing for Chef because it’s based on Ruby, which happens to be one of my favorite programming languages. When I started working at Jimdo in 2013, however, I suddenly had to use Puppet. Instead of coming to grips with Puppet and accepting the situation as it was, I spent my first weeks at the new gig ranting about how bad this Puppet thing is and how Chef would magically solve all problems. Of course, this didn’t change anything. I knew that sooner or later I had to learn Puppet if I wanted to work on Jimdo’s infrastructure.

The last mistake is the worst of the three. I failed to deliver because I didn’t care enough about a project as a whole. The project’s goal was to add proper monitoring to our cron jobs. For this, we developed a tool in Go that would allow us to wrap cron jobs and send results to Nagios. It worked out pretty well. The problem: building that tool was only a small but by far the most interesting part of the project. Rather than wrapping up the remaining tasks, I was too busy learning more about the shiny technologies we explored at that time (Go and AWS). This isn’t the full story; suffice it to say I wasn’t part of the team that completed the project later on.

The three rules of infrastructure automation

It goes without saying that I don’t want to repeat any of those mistakes. To achieve that, I’ve turned the lessons I learned over the years into three simple rules. I call them, appropriately, The three rules of infrastructure automation. I’m convinced that by following these rules, I’ve been able to improve my work and become a more successful engineer.

The three rules are:

  1. Don’t blindly automate all the things. Take a step back and evaluate if a task is actually worth the effort. Feel the pain before doing something about it. Ignore the problem for a while; it might not be that big of an issue. If you still decide to invest in automation, you will know exactly where the pain points are. Ask for feedback when you’re stuck. Always be willing to adjust and, more importantly, have the courage to stop what you’re doing and move on.

  2. Treat tools as what they are: a means to an end. Don’t fight over tools or programming languages. Use whatever works for you or your company. In the end, it’s all about creating (business) value. More often than not, tools are not the reason why we fail to reach our goals. Before complaining about software, remember that certain design decisions probably made sense to the people at the time they built it. Be open to learning new things.

  3. Take ownership of your work. First and foremost, do the work and deliver what you promise (while adhering to the first two rules). Get both the interesting and the boring tasks done. Yes, automation isn’t always fun. That’s just how it is. Accept it and move forward. Taking ownership also means to keep caring. Fix bugs. Help others relying on your work. Don’t automate and forget.

I’ve learned to live my professional life by the above rules. They provide me with the right mental framework for approaching automation, if not programming in general. These rules were and continue to be a tremendous help to me. They may help you too.

Updated: