How to succeed at infrastructure automation

3 minute read

Update (March 2016): There’s now an updated and revised version of this article.

Readers of this blog know that I care a lot about infrastructure automation and the art of turning infrastructure into code. I find pleasure in using and developing build, test, and provisioning systems. It’s what I get paid to do every day and can’t stop doing after work.

For an engineer like me, the questions of what to automate and how to go about it are of particular interest. Alas, I don’t always have the right answer to those questions. That’s when things go wrong and mistakes happen. Fortunately, mistakes are also a wonderful opportunity to learn – and to eventually succeed.

Screwing up

I’ve been automating infrastructure of one kind or another for a couple years now. It goes without saying that I made some rookie mistakes along the way. (In fact, I still struggle from time to time, but who doesn’t?) I don’t mean technical mistakes like bugs, which are unavoidable. I mean more fundamental problems that go beyond mere writing of code.

Here are three things I know I did wrong:

  • I once spent three full days writing automated tests for a shell script that performs backups. I wanted to change the script – a beast of hundreds of lines of messy code – without breaking the backup process. After wasting a couple hours trying to tame the test framework to do what I wanted it to do (how hard could it be?), I already knew that this wasn’t going to end well. While I eventually managed to write the tests, they ended up being very brittle and verbose, adding only little confidence. Even worse, and hindsight bias aside, at no time did those three days feel like adding any value. It was just my ego pushing me.

  • The first configuration management system I learned to use – and still the one I like the most – is Chef. I guess I mainly like Chef because it’s based on Ruby, which happens to be one of my favorite programming languages. When I started working at Jimdo in 2013, however, I suddenly had to use Puppet. Instead of coming to grips with Puppet and accepting the situation as it is, I spent my first weeks at the new gig ranting about how bad this Puppet thing is and how Chef would magically solve all problems. Of course, this didn’t change anything. I knew that sooner or later I had to learn Puppet if I wanted to work on Jimdo’s infrastructure.

  • The last mistake is the worst of the three. I failed to deliver because I didn’t care enough about a project as a whole. The project’s goal was to add proper monitoring to our cron jobs. For this, we developed a tool in Go that would allow us to wrap cron jobs and send results to e.g. Nagios. It worked out pretty well. The problem: building that tool was only a small, but by far the most interesting part of the project. Rather than wrapping up the remaining tasks, I was too busy learning more about the shiny technologies we used at that time (Go and AWS). This is not the full story; suffice it to say I wasn’t part of the team that completed the project later on.

The three rules of infrastructure automation

It goes without saying that I don’t want to repeat any of those mistakes. To achieve that, I’ve turned the lessons I learned over the years into three simple rules. I call them, appropriately, The three rules of infrastructure automation. I’m convinced that by following these rules, I’ve been able to improve my work and, in particular, be more successful at infrastructure automation. The rules are:

  1. Don’t blindly automate all the things. It’s better to take a step back and evaluate if the task is actually worth the effort. Always be willing to adjust and, more importantly, have the courage to stop what you’re doing and move on.

  2. Use whatever tool works for you or your company. Don’t fight over tools or programming languages. In the end, it’s all about creating (business) value. Remember that tools are only a means to that end.

  3. Care about your work. First and foremost, ship it! Get both the interesting and the boring tasks done (yes, automation isn’t always fun). Equally important: keep caring. Fix bugs. Help others relying on your work. Don’t automate and forget.

I’ve learned to live my professional life by the above rules. They were and continue to be a tremendous help to me. I’m sure there are other strategies that work for you. Feel free to share them in the comments below. Thank you.

Update: Hacker News thread